Bug 736578
Summary: | writes to single file from multiple processes clobber each other | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Max Matveev <makc> | ||||
Component: | kernel | Assignee: | Steve Dickson <steved> | ||||
Status: | CLOSED ERRATA | QA Contact: | Chao Ye <cye> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 5.6 | CC: | bfields, borgan, ccui, eguan, harshula, nfs-maint, nmurray, rwheeler, smayhew, sprabhu, steved | ||||
Target Milestone: | rc | Keywords: | Regression, Reopened | ||||
Target Release: | 5.9 | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | kernel-2.6.18-380.el5 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | |||||||
: | 1007035 (view as bug list) | Environment: | |||||
Last Closed: | 2014-09-16 00:32:48 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1007035, 1007039 | ||||||
Bug Blocks: | 743405, 807974, 928283, 1049884 | ||||||
Attachments: |
|
Description
Max Matveev
2011-09-08 07:14:48 UTC
This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.8 and Red Hat does not plan to fix this issue the currently developed update. Contact your manager or support representative in case you need to escalate this bug. Hey Max, I'm not seeing this type of behaviour with a later kernel (2.6.18-310.el5). Would you mind retesting with a later kernel? tia... Apologies for tardiness - was away for a while... FWIW, on a VM with 2 virtual CPUs [root@rebecca ~]# nfsstat -m /mnt/tmp from 192.168.122.46:/tmp Flags: rw,vers=3,rsize=131072,wsize=131072,hard,proto=tcp,timeo=600,retrans=2,sec=sys,addr=192.168.122.46 [root@rebecca ~]# uname -a Linux rebecca 2.6.18-308.8.2.el5 #1 SMP Tue May 29 11:54:17 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux I'm still seeing very slow writes and I've got tcpdump full of FILE_SYNCs to prove it. Ok, I can reproduce this too, and it's also a problem in current mainline kernels. We'll have to pursue a fix there first and then we can discuss backporting it for RHEL5 and 6. Max, didn't you start a discussion thread on this upstream? If so, then reviving that might be the best first step. (In reply to comment #0) > I think it was e992fe54551ca2427ca532de6ef88b1e1d2a0487 which > introduced the problem - in this case there is no locking involved > but the test in nfs_flush_incompatible > Yes, that's almost assuredly when the problem was introduced. > if (req->wb_page != page || ctx != req->wb_context || > req->wb_lock_context->lockowner != current->files || > req->wb_lock_context->pid != current->tgid) > status = nfs_wb_page(inode, page); > > evaluates to true because we're indeed dealing with differnt set of files > and different process Id. > Yes. There's also a similar problem in nfs_can_coalesce_requests()... > I think we need to check if there are any locks in the lock context > before comparing the pointers and pids. > A couple of problems with that idea: 1) we don't have a list of locks associated with a particular lockowner. Getting that info is not as direct as one would hope. It'll probably involve taking locks and walking lists, etc. 2) Even if we did, will we have problems if someone sets a lock after the page has been dirtied and the lockowner set? I'll need to ponder this part. (In reply to comment #6) > Max, didn't you start a discussion thread on this upstream? If so, then > reviving that might be the best first step. No - I was planing to spend a bit more time researching it before going to Trond but it never quite happen. (In reply to comment #7) > > I think we need to check if there are any locks in the lock context > > before comparing the pointers and pids. > > A couple of problems with that idea: > > 1) we don't have a list of locks associated with a particular lockowner. > Getting that info is not as direct as one would hope. It'll probably involve > taking locks and walking lists, etc. But in NFSv3 case there should be no lockowner at all or there should be a 'joker' lockowner which matches any request. What's more, given that one cannot have the same file/inode/page on NFSv3 and NFSv4 mount at the same time, NFSv3 should not concern itself with this concept. > 2) Even if we did, will we have problems if someone sets a lock after the > page has been dirtied and the lockowner set? I'll need to ponder this part. This is interesting point - arguably client should flush all pages which have been dirtied before granting the lock and not allowing any new pages to be mucked up until the lock is granted in order to obey the intent of locking (as someone put it to me once - NFS locking is only advisory and one doesn't have to follow bad advise). In this case multiple processes are writing to the same file but the cooperation is not done using locking, instead it's done in the application itself. Created attachment 601525 [details]
patch -- skip lockowner related checks on v2/3 mounts
Here's an initial proof-of-concept patch for mainline. I'll probably send this upstream once I write up a description of the problem. Trond will hate it, but it might help spur discussion... ;)
Basically the idea here is to just skip these new checks when we're on a non-stateful mount (vers < 4). This seems to work with your reproducer on v3, but the final fix will probably look different.
Ok, strawman patch and description of the problem sent to Trond and cc'ed to linux-nfs: http://article.gmane.org/gmane.linux.nfs/51240 ...maybe we'll get some traction. Ok, Trond replied back on that thread saying that he'd rather we move to a scheme where we track what open/lock stateid each request was done under. That makes a lot of sense, but will be a bit of an overhaul. He said he's working out a draft patchset for this, so at this point I'm just waiting to see what he comes up with. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. Reopening for consideration in later releases... This problem is still unresolved upstream, but I think Trond still has a patchset in progress for it. If and when we get a resolution on mainline kernels then we can look at what can be done for existing RHEL releases. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-1196.html Patch(es) available in kernel-2.6.18-398.el5 You can download this test kernel (or newer) from http://people.redhat.com/agordeev/el5/ Detailed testing feedback is always welcomed. If you require guidance regarding testing, please ask the bug assignee. |