| Summary: | Enhancement (cloudfs): allow itable to be virtualized | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Jeff Darcy <jdarcy> |
| Component: | core | Assignee: | Anand Avati <aavati> |
| Status: | CLOSED WONTFIX | QA Contact: | |
| Severity: | low | Docs Contact: | |
| Priority: | medium | ||
| Version: | mainline | CC: | chrisw, gluster-bugs |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| URL: | http://git.fedorahosted.org/git/?p=CloudFS.git;a=blob;f=patches/root_inode.patch;h=b78ef2e019be631685d1c5157032d9b8189f621c;hb=HEAD | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Anand Avati
2011-01-21 15:28:38 UTC
(In reply to comment #1) > Even though the patch by itself is fine, it is not complete. There are many > places where the inode table pointer is picked up by dereferencing > subvol->itable. I looked for such references, but I'm not sure this concern applies to many. For a start, many of those references are in client code which is unaffected by any of this. A few are in debug or deprecated code; I checked those anyway, and I think they're OK. The protocol/server code all uses state->itable, which is now being picked up from the right place. The only area that I think is affected is NFS, which uses constructs like cs->vol->itable in its path-resolution code. This could be changed to be more like protocol/server, e.g. by adding nfs3_call_state_t.itable (much like server_state_t.itable) and invoking the get_itbl interface to populate it, pretty easily. I'm not sure how much that would buy us, though, since there are other issues (e.g. authentication) that also need to be dealt with before the multi-tenancy stuff will work through NFS. > I need to understand if > your translator needs to 'kick in' and do something on every fop (other than > redirecting fops to the appropriate subvolume). If redirecting is all what > cloud does for fops, then we can figure out an interface where it can ask > protocol server to 'reset' the bound_xl to the appropriate xlator. I actually tried to implement things this way, but the changes became even more invasive than the patch I've given. For it to work, the authentication and "re-binding" have to be among the very first things that get done when the connection is established, even before the very first lookup which uses itable. That would have required fundamental changes to the server connection-setup code, which we could still do if you think that's the right way, but it seemed like a patch based on that approach would be even more troubling than this one. > Would such a scheme come in your way for the feature set you have in mind? Do > you have a doc which lists the goals that the cloud xlator is expected to > achieve? The main descriptions are at Fedora (https://fedoraproject.org/wiki/Features/CloudFS) and my own project site (http://cloudfs.org/?p=11). In a nutshell, the multi-tenant requirement addresses the case where I'm a user in BarnesAndNobleCloud, and BANC has decided to offer "filesystem as a service" to differentiate themselves from their main competitor. That would let me connect my VMs to their big shared filesystem, in such a way that I see my own "universe" in terms of both namespace and UID/GID space (since I can make up not just paths but also UIDs/GIDs within my VMs). They get one filesystem to manage; tenants get the appearance of one filesystem each. For this to work, something on the provider side has to map per-tenant paths and IDs into their global equivalents. That means knowing which "universe" each request belongs to. That in turn depends on strong authentication - of the system/connection, not of the individual UID/user - which does not currently exist and might well be implemented as part of the same translator. That's the "cloud" translator. It can already do weak authentication and (with this patch or the one you've proposed) namespace mapping; in the near future it will do stronger authentication and UID/GID mapping as well. You can view the code for the translator itself at http://git.fedorahosted.org/git/?p=CloudFS.git if you're interested. For the multi-tenancy feature of cloudfs, it is necessary to redirect requests to specific subvolumes based on tenant identity. Setting up multiple protocol/server instances within one process is not only cumbersome but incompatible with the volfile-fetching and port-mapping mechanisms which assume a 1:1:1 identity between processes, ports, and volfiles. For the most part this can be worked around by putting the tenant-specific subvolumes under a single translator which binds to protocol/server to handle both subvolume dispatch and other functions (e.g. UID mapping) based on the authenticated identity associated with a connection. The place where this fails is in server-resolve, which attempts lookups before any regular translator ever sees the request, using the inode table associated with its single bound_xl. The patch at the attached URL attempts to address this issue by giving the bound_xl a chance to "virtualize" the inode table for each tenant before lookups are done, allowing each tenant to have a truly separate namespace. Every translator other than the cloudfs multi-tenant translator will transparently default to a function which preserves existing behavior, so there should be no disruption to existing translators, but this functionality is critical for cloudfs and impossible to implement without even more invasive changes to other subsystems. There are some secondary issues to do with the special xattrs used by DHT (and possibly AFR) but the patch as-is handles the essentials. More refinements can be added later. I gave Anand Avati's alternate approach another try, and was able to make things work so long as the re-assignment of bound_xl is done early - not necessarily first, but before anything other than an initial root-inode lookup. Thus, this patch is no longer needed, but for posterity I'll note two related issues. (1) It would still be nice to have an interface to reassign bound_xl. The way the cloud translator does it now is a bit of a layering violation. (2) This still doesn't address the issues with NFS, which IMO needs something like bound_xl instead of using cs->vol->itable (or similar). As noted, though, this is hardly the only issue that needs to be addressed between NFS and CloudFS. Despite these remaining issues, though, many thanks to Anand Avati for pointing the way to a solution that's much cleaner (and more robust) overall. CloudFS doesn't need this any more. Withdrawn. |