Bug 211931
Summary: | nfsd needs to pass intent information to vfs_create() for GFS | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Ben Marzinski <bmarzins> |
Component: | kernel | Assignee: | Steve Dickson <steved> |
Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 5.0 | CC: | aviro, dzickus, esandeen, jlayton, kanderso, staubach, steved |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2008-04-26 12:17:02 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ben Marzinski
2006-10-23 22:03:51 UTC
assign this to Eric for evaluation. Because it touches all filesystem.. so this is very hairy one... but do feel free to reassign to the appropriated component owner. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Per Linda's request, assigning to esandeen... That request was a long time ago ;) I *think* this is better suited for the nfs folks to fix, but if not, bounce it back please. :) Personally I think its a very slippery slow for nfsd to be passing partially filled in nameidata structures... Something I'm not in favor of... Also I'm a bit surprised that GFS does not have some way to detect such races... This seems like a vary common cluster problem to having racing thread trying to create the same file... Finally, if the O_EXCL is set and the second create loses the race, why is sending back a failure a problem? The file truly does exist... I'm not sure that I understand you. From GFS's point of view, there is no race. GFS gets a create request, and notices that the file already exists. For every case except for NFS on top of GFS, GFS will get a nameidata structure that tells it whether this create request is an O_EXCL request or not. NFS doesn't pass this information down, because it assumes that if the file doesn't exist when it receives the create request, it will still not exist when NFS passes the request down to the underlying file system. This will always be true for a single machine filesystem. There is no way to guarantee that this will be true on a clustered filesystem. Without the nameidata there is no way for GFS (or any other filesystem) to if the file was opened O_EXCL or not. So if racing threads are trying to create the same file exclusively on multiple nodes running GFS, it will always work fine. One, and only one, will succeed. Put NFS on top of GFS, and now GFS has no way to know whether or not the create requests were exclusive or not. Currently, if GFS gets a create request, and the file already exists, and there is no nameidata (this can only happen with NFS) it just assumes that the create is not exclusive. This means that exclusively opening a file on NFS over GFS can break POSIX semantics. It is possible for two threads on two seperate machines to both think that they exclusively created this file. Unfortunately, this sort of operation is sometimes used by applications to do locking, so breaking POSIX semantics here can cause some fairly large problems. Ok.. I understand... So GFS basically needs an "exclusive" bit so it can tell what to do wrt to failing or succeeding an open(O_EXCL). I'm still not a fan of passing down a bastardized nameidata structure. I think thats just going to cause problems with other filesystems. Adding a bit to the mode field would be a bit hackish... and I can't see either one making it in upstream... What seems to be needed is way to do a lookup that would populate the nameidata structure but not actually doing the lookup... Unfortunately an interface does not exist. This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release. Development Management has reviewed and declined this request. You may appeal this decision by reopening this request. |