Bug 469848
Summary: | [RHEL5.2] nfs_getattr() hangs during heavy write workloads | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Flavio Leitner <fleitner> | ||||
Component: | kernel | Assignee: | Peter Staubach <staubach> | ||||
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | urgent | ||||||
Version: | 5.2 | CC: | adestefano, dhoward, dmair, emcnabb, harshula, james.leddy, jlayton, jpirko, kurt.orita, masanari_iida, rlerch, rprice, rwheeler, skylar2, steved, tao, tbaumann, terry.johnson, wnguyen | ||||
Target Milestone: | rc | Keywords: | ZStream | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
The required semantics indicate that a process which does stat, write, stat, should see a different mtime on the file in the results from the second stat call compared to the mtime in the results from the first stat call. File times in NFS are maintained strictly by the server, so the file mtime will not be updated until the data has been transmitted to the server via the WRITE NFS protocol operation. Simply copying data into the pagecache is not sufficient to cause the mtime to be updated. This is one place where NFS differs from local file systems. Therefore, an NFS filesystem which is under a heavy write workload may result in stat calls having a high latency.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-09-02 08:11:54 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 468301, 483701, 485920, 486926, 546233 | ||||||
Attachments: |
|
Description
Flavio Leitner
2008-11-04 13:15:41 UTC
I'm not an NFS expert (nor do I play one on TV).. but .. Would it be possible to simply send the 1st (and only the 1st) dirty page (if at least one dirty page was present) for a given NFS inode any time a local stat is made on an NFS client? Wouldn't a one page update force mtime on the host? This event sent from IssueTracker by jwest issue 234012 Sending one page would make it look close to correct right for that one stat call. Then what happens? The rest of the pages eventually get flushed. You do another stat call and...WTF? The file has changed? Who wrote to it? The bottom line is that on a getattr we need to sync the state of the file out to the server and that means flushing all pending writes. (Peter, correct me if I'm wrong here) This is correct, Jeff. Perhaps I oversimplified when describing the semantics that are required when handling the mtime. *** Bug 464251 has been marked as a duplicate of this bug. *** *** Bug 467374 has been marked as a duplicate of this bug. *** Peter, How goes the alternative patch/solution? IBM is willing to help test the next solution we come up with. --jwest The customer is providing access to this NFS mount to multiple clients via ftp. As such, they have a large mix of ftp related directory read activity combined with the potential for large file writes. One remote client doing a large upload effectively stalls the access for other clients. We have already set expectations that they will still see performance issues if a large write is happening. They have migrated from a RHEL4 environment (where they do not see this severe a stall), and would be happy to simply get the responsiveness closer to what it was in RHEL4. Our testing showed that the previous proposed patch accomplished this goal. This event sent from IssueTracker by jwest issue 234012 Thank you for the description of the environment and the situation. This helps me to much better understand the situation. Yes, the previously proposed patch would make things generally better. There might still be very large delays, but they would mostly be finite in nature. If I can't get my patch accepted upstream in a reasonable amount of time, then we will go with the proposed patch for the time being and then I will continue to pursue the better option. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Updating PM score. in kernel-2.6.18-132.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified. The x86_64 version of the 2.6.18-132 has not made any difference in behavior: Before: # time cp /tmp/bigfile /usr/local/tmp real 1m44.479s user 0m0.015s sys 0m0.246s # time ls /usr/local/tmp a b bigfile c d real 1m38.688s user 0m0.000s sys 0m0.009s After: # time cp /tmp/bigfile /usr/local/tmp real 1m46.442s user 0m0.017s sys 0m0.222s # time ls /usr/local/tmp a b bigfile c d real 1m44.069s user 0m0.000s sys 0m0.012s (Note: the ls was manually run in a second window once the cp was started, so a 1-2 second variation in test times would be expected) This event sent from IssueTracker by jwest issue 234012 Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The required semantics indicate that a process which does stat, write, stat, should see a different mtime on the file in the results from the second stat call compared to the mtime in the results from the first stat call. File times in NFS are maintained strictly by the server, so the file mtime will not be updated until the data has been transmitted to the server via the WRITE NFS protocol operation. Simply copying data into the pagecache is not sufficient to cause the mtime to be updated. This is one place where NFS differs from local file systems. Therefore, an NFS filesystem which is under a heavy write workload may result in stat calls having a high latency, sometimes up to 30 seconds. This looks fine, except for the very last clause. Please remove everything after the last comma in the paragraph. The latencies can be easily much higher than that. Removed ", sometimes up to 30 seconds". Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -The required semantics indicate that a process which does stat, write, stat, should see a different mtime on the file in the results from the second stat call compared to the mtime in the results from the first stat call. File times in NFS are maintained strictly by the server, so the file mtime will not be updated until the data has been transmitted to the server via the WRITE NFS protocol operation. Simply copying data into the pagecache is not sufficient to cause the mtime to be updated. This is one place where NFS differs from local file systems. Therefore, an NFS filesystem which is under a heavy write workload may result in stat calls having a high latency, sometimes up to 30 seconds.+The required semantics indicate that a process which does stat, write, stat, should see a different mtime on the file in the results from the second stat call compared to the mtime in the results from the first stat call. File times in NFS are maintained strictly by the server, so the file mtime will not be updated until the data has been transmitted to the server via the WRITE NFS protocol operation. Simply copying data into the pagecache is not sufficient to cause the mtime to be updated. This is one place where NFS differs from local file systems. Therefore, an NFS filesystem which is under a heavy write workload may result in stat calls having a high latency. This looks fine to me. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html *** Bug 491121 has been marked as a duplicate of this bug. *** (In reply to comment #37) The original author of comment #37 subsequently made these observations in IssueTracker: "I just executed an additional test with a much larger file, and did see that the behavior was very dependent upon how long I let the write go before trying the ls. I suspect that I had delayed long enough in my previous test to allow the system to buffer 100% of the file before I issued the ls request." "The customer fell into the same trap that I did on their initial test (too small of a file used for the test resulted in the entire file being buffered before the ls command == no change in behavior as the new code still waits for the buffered data to be flushed). Using a larger file did indeed demonstrate a positive change in behavior." |