Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1397544

Summary:	RFE: mount CephFS subtree for an export
Product:	[Retired] nfs-ganesha	Reporter:	Ram Raja <rraja>
Component:	FSAL_CEPH	Assignee:	Jeff Layton <jlayton>
Status:	CLOSED UPSTREAM	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	devel	CC:	ffilz, jlayton, rraja
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2016-12-19 12:53:55 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ram Raja 2016-11-22 18:47:57 UTC

Description of problem:
Modify the Ceph FSAL to mount the subtree instead of the entire filesystem for each export. And mount using a ceph auth ID with the subtree path restricted MDS caps rather than needing to use auth ID 'admin' having no MDS path caps restrictions.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Jeff Layton 2016-11-22 19:23:38 UTC

Looks reasonable. I've taken a brief look and see what we'll have to rework in ganesha to make this happen. I think we have to change the lookup_path prototype to deal with a path prefix (representing the path up to the root of the export) and any path relative to that namespace root.

OTOH, if we never look up anything but the root of the export, then maybe we don't need lookup_path to be that flexible, and can just teach ceph to ignore the path and  always just look up the root of the export.

Comment 2 Frank Filz 2016-11-22 19:38:46 UTC

(In reply to Jeff Layton from comment #1)
> Looks reasonable. I've taken a brief look and see what we'll have to rework
> in ganesha to make this happen. I think we have to change the lookup_path
> prototype to deal with a path prefix (representing the path up to the root
> of the export) and any path relative to that namespace root.
> 
> OTOH, if we never look up anything but the root of the export, then maybe we
> don't need lookup_path to be that flexible, and can just teach ceph to
> ignore the path and  always just look up the root of the export.

lookup_path is used to set up an export, and it is used for NFS v3 mount of a sub-directory of the export.

Comment 3 Jeff Layton 2016-12-02 15:49:30 UTC

I'll take this for now since I'm working on a patch. The first step, I think is to mount the subtree using the same creds we do now. Once we have that, we can look at adding a new ganesha config option to make it use different creds.

Comment 4 Jeff Layton 2016-12-08 19:46:14 UTC

Gerritt review request is here:

    https://review.gerrithub.io/#/c/305345/

Patch turned out to be a little simpler than expected, and seems to work just fine with everything that I've pointed at it so far.

The next step is to add a way to give each export a different cephx user. What I'm not clear on is how to manipulate cephx creds from the libcephfs api. Maybe there is some ceph_conf_set option that we can use?

Comment 5 Ram Raja 2016-12-09 11:58:02 UTC

> The next step is to add a way to give each export a different cephx user. What > I'm not clear on is how to manipulate cephx creds from the libcephfs api. Maybe > there is some ceph_conf_set option that we can use?

Why would the libcephfs API want to manipulate the cephx creds? I was thinking that a per export FSAL_CEPH specific user option would supply the ceph auth ID (already created with path restricted MDS caps), which would be used for each export. So the FSAL_CEPH would pass the auth ID as an argument for `ceph_create`?

Comment 6 Jeff Layton 2016-12-09 12:19:06 UTC

Ahh, that's what I was missing. Yes, passing a user string argument to ceph_create is what we'd want. I'll look at what we'd need to add a new config option to the FSAL CEPH section in the ganesha configs.

Comment 7 Jeff Layton 2016-12-09 16:09:36 UTC

Ok! I think I have something that may work for you in my ganesha ceph-submount branch. See:

    https://github.com/jtlayton/nfs-ganesha/tree/ceph-submount

Ram, if you have the time, could you try that out?

This branch should build against kraken (with libcephfs2) or jewel (with libcephfs1) though you may need to drop the specfile patch in that pile if you're building against packages that have a libcephfs1-devel package.

You should be able to set the export FSAL user_id and secret_access_key options in the config file, or using dbus. If you provide a user_id but no secret key, then it should try to find the key for that user in the usual keyring files.

Comment 8 Jeff Layton 2016-12-09 16:10:10 UTC

Note that this branch builds, but is otherwise untested!

Comment 9 Jeff Layton 2016-12-13 15:48:26 UTC

Ramana said that he tried the patches but that they weren't working as expected. Ramana, can you maybe detail how you're testing this?

Specifically, how the new users and keys are being created, and what values you're providing in the new configuration options.

Comment 10 Ram Raja 2016-12-13 19:53:09 UTC

Jeff, sorry about the delay.

Testing steps
-------------

1. Installed jewel version of Ceph in a VM.
   $ ceph --version
   ceph version 10.2.4-3-gc461ee1 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

2. Setup  an all-in-one Ceph backend (1 MON, 1MDS, 1 OSD) in the VM.
   $ sudo ceph -s
    cluster 0df98128-b7e6-434c-96c7-5f729623d1b6
     health HEALTH_OK
     monmap e1: 1 mons at {osboxes=10.0.2.15:6789/0}
            election epoch 8, quorum 0 osboxes
      fsmap e35: 1/1/1 up {0=a=up:active}
     osdmap e30: 1 osds: 1 up, 1 in
            flags sortbitwise,require_jewel_osds
      pgmap v3436: 80 pgs, 3 pools, 984 kB data, 24 objects
            160 MB used, 8021 MB / 8182 MB avail
                  80 active+clean

   Ceph config file,
   $ cat /etc/ceph/ceph.conf
    [global]
    fsid = 0df98128-b7e6-434c-96c7-5f729623d1b6
    mon_initial_members = osboxes
    mon_host = 10.0.2.15
    auth_cluster_required = cephx
    auth_service_required = cephx
    auth_client_required = cephx
    filestore_xattr_use_omap = true
    osd crush chooseleaf type = 0
    osd journal size = 100
    osd pool default size = 1
    rbd default features = 1

    [client.manila]
    client mount uid = 0
    client mount gid = 0


3. Installed NFS-Ganesha from source (using your Ganesha dev branch) in the same VM.
   $ cd ~/git/nfs-ganesha/ && cat .git/refs/heads/ceph-submount
   059024237ca2968d4d1670d14551b2befdbe06d7

4. Allowed Ganesha to mount a CephFS subtree using
   * user ID 'manila' and its secret key passed as FSAL options
   * user ID 'manila' passed as a FSAL option. the keyring file in the default location
   Both seemed to work. Note that 'manila' user *did not* have any path restricted MDS caps.

   $ sudo ceph auth get client.manila
     exported keyring for client.manila
     [client.manila]
        key = AQBaUylYMrj9EBAA33JPniv+QhVjeXnETocGCw==
        caps mds = "allow *"
        caps mon = "allow *"
        caps osd = "allow rw"

   $ cat /etc/ganesha/ganesha.conf
   EXPORT
   { 
        Export_ID=100;

        Path = /volumes/_nogroup/b86384b2-c52c-4607-bcaf-ff294acf1b97;

        Pseudo = /volumes/_nogroup/b86384b2-c52c-4607-bcaf-ff294acf1b97;

        Protocols = 4;

        Transports = TCP;

        FSAL {
                Name = CEPH;
                User_Id = "manila";
                Secret_Access_Key = "AQBaUylYMrj9EBAA33JPniv+QhVjeXnETocGCw==";
        }

        CLIENT {
                Clients = 10.0.2.13;
                Access_Type = RO;
        }

        CLIENT {
                Clients = 10.0.2.15;
                Access_Type = RW;
        }

    }

   $ sudo ganesha.nfsd -f /etc/ganesha/ganesha.conf -L /tmp/ganesha-user-manila-secret-key.log -N NIV_DEBUG

   $ sudo ceph daemon mds.a session ls
   [
      {
        "id": 54109,
        "num_leases": 0,
        "num_caps": 4,
        "state": "open",
        "replay_requests": 0,
        "completed_requests": 0,
        "reconnecting": false,
        "inst": "client.54109 10.0.2.15:0\/2337762318",
        "client_metadata": {
            "ceph_sha1": "c461ee19ecbc0c5c330aca20f7392c9a00730367",
            "ceph_version": "ceph version 10.2.4-3-gc461ee1 (c461ee19ecbc0c5c330aca20f7392c9a00730367)",
            "entity_id": "manila",
            "hostname": "osboxes",
            "root": "\/"
        }
     }
   ]
Here I've a question. I was expecting the "root"'s value to be the subtree path
in the "client_metadata" section, since ganesha mounts only the subtree?


5. I wasn't able to get Ganesha to mount a subtree using a user ID with path restricted MDS caps.

  $ sudo ceph auth get client.alice
  exported keyring for client.alice
  [client.alice]
       key = AQCj1i5YOO/gOhAADVpTk6XTXNoERTpvPyUfqQ==
       caps mds = "allow rw path=/volumes/_nogroup/b86384b2-c52c-4607-bcaf-ff294acf1b97"
       caps mon = "allow r"
       caps osd = "allow rw pool=cephfs_data namespace=fsvolumens_b86384b2-c52c-4607-bcaf-ff294acf1b97"

  $ cat /etc/ganesha/ganesha.conf
  EXPORT
  {
        Export_ID=100;

        Path = /volumes/_nogroup/b86384b2-c52c-4607-bcaf-ff294acf1b97;

        Pseudo = /volumes/_nogroup/b86384b2-c52c-4607-bcaf-ff294acf1b97;

        Protocols = 4;

        Transports = TCP;

        FSAL {
                Name = CEPH;
                User_Id = "alice";
                Secret_Access_Key = "AQCj1i5YOO/gOhAADVpTk6XTXNoERTpvPyUfqQ==";
        }

        CLIENT {
                Clients = 10.0.2.13;
                Access_Type = RO;
        }

        CLIENT {
                Clients = 10.0.2.15;
                Access_Type = RW;
        }

   }



  I observed the following error in the ganesha.log after starting the ganesha server,

create_export :FSAL :CRIT :Unable to mount Ceph cluster for /volumes/_nogroup/b86384b2-c52c-4607-bcaf-ff294acf1b97.
mdcache_fsal_create_export :FSAL :MAJ :Failed to call create_export on underlying FSAL Ceph
fsal_put :FSAL :INFO :FSAL Ceph now unused
l_cfg_commit :CONFIG :CRIT :Could not create export for (/volumes/_nogroup/b86384b2-c52c-4607-bcaf-ff294acf1b97) to (/volumes/_nogroup/b86384b2-c52c-4607-bcaf-ff294acf1b97)
 build_default_root :CONFIG :DEBUG :Allocating Pseudo root export
 pseudofs_create_export :FSAL :DEBUG :Created exp 0x10809c0 - /
 build_default_root :CONFIG :INFO :Export 0 (/) successfully created
 main :NFS STARTUP :WARN :No export entries found in configuration file !!!
 config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:13): 1 validation errors in block FSAL
 config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:13): Errors processing block (FSAL)
 config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:1): 1 validation errors in block EXPORT
config_errs_to_log :CONFIG :CRIT :Config File (/etc/ganesha/ganesha.conf:1): Errors processing block (EXPORT)

And the line number 13 in the ganesha.conf was the CEPH FSAL sub-block.
I guess when trying to ceph_mount using 'alice' here,
https://github.com/jtlayton/nfs-ganesha/commit/077a2d26b1863a1f42ef54e14446d423af6c016a#diff-548844599138ebf79915b9d8e9e59031R215
hits EPERM?


However, I was able to mount the subtree using the same auth ID 'alice' via ceph-fuse.
  $ sudo ceph-fuse /mnt/fuse/ --id=alice --client-mountpoint=/volumes/_nogroup/b86384b2-c52c-4607-bcaf-ff294acf1b97

  $ sudo ceph daemon mds.a session ls
   [
    {
        "id": 54120,
        "num_leases": 0,
        "num_caps": 1,
        "state": "open",
        "replay_requests": 0,
        "completed_requests": 0,
        "reconnecting": false,
        "inst": "client.54120 10.0.2.15:0\/2847125433",
        "client_metadata": {
            "ceph_sha1": "c461ee19ecbc0c5c330aca20f7392c9a00730367",
            "ceph_version": "ceph version 10.2.4-3-gc461ee1 (c461ee19ecbc0c5c330aca20f7392c9a00730367)",
            "entity_id": "alice",
            "hostname": "osboxes",
            "mount_point": "\/mnt\/fuse",
            "root": "\/volumes\/_nogroup\/b86384b2-c52c-4607-bcaf-ff294acf1b97"
        }
    }
   ]

Comment 11 Jeff Layton 2016-12-13 20:24:53 UTC

Yeah, that "ceph ls" output does look odd. "alice" made it to the "entity_id", but AFAICT, "id" is set to the default and I suspect that that is what gets used. libcephfs.h says:

/**
 * Create a mount handle for interacting with Ceph.  All libcephfs
 * functions operate on a mount info handle.
 *
 * @param cmount the mount info handle to initialize
 * @param id the id of the client.  This can be a unique id that identifies
 *           this client, and will get appended onto "client.".  Callers can
 *           pass in NULL, and the id will be the process id of the client.
 * @returns 0 on success, negative error code on failure
 */
int ceph_create(struct ceph_mount_info **cmount, const char * const id);


Which sure makes it sound like that would be the "username" for the cephx creds.

Maybe I'll roll up a ceph regression test for this as well...

Comment 12 Jeff Layton 2016-12-13 21:29:17 UTC

Actually...Ramana can you try one other thing? Assuming that the keyring file is in a standard location, does it work if you comment out the Secret_Access_Key field, which should let ganesha scrape the key out of the keyring?

Maybe we're just misunderstanding how that field needs to be set...

Comment 13 Ram Raja 2016-12-14 10:18:44 UTC

I hit the same error when I didn't pass the Secret_Access_Key FSAL option for user 'alice' (with path restricted MDS caps), and 'alice' user's keyring file was in the standard location.

I'd be surprised if it's an issue with setting the Secret_Access_Key field. The export creation succeeded when I set the Secret_Access_Key field for user
'manila' (with *no* path restricted MDS caps); and export creation failed when I
intentionally set the incorrect Secret_Access_Key for user 'manila'. And in both cases 'manila' user's keyring file was not in the default location.

Comment 14 Jeff Layton 2016-12-14 17:40:57 UTC

Got it...thanks. I do seem to recall some discussion about submounts during the kraken development cycle which led me to believe that they might not work correctly in jewel.

That sasid, I think I'm going to have to roll up a test program to understand how the path restricted caps should work...

Comment 15 Jeff Layton 2016-12-15 00:43:36 UTC

Ok, the problem turns out to be a ceph bug:

http://tracker.ceph.com/issues/18254

I have a patch to fix this in ceph, but it'll take a while to trickle out to the stable releases. In the meantime, I've updated my ganesha branch with a workaround for this bug. The workaround is pretty harmless so I'm inclined to merge it into ganesha and just live with it in perpetuity.

Ramana, can you try my latest ceph-submount ganesha branch and let me know whether it works?

Comment 16 Ram Raja 2016-12-15 09:50:21 UTC

Setting the `client_mountpoint` conf option seems to have done the trick.
https://github.com/jtlayton/nfs-ganesha/commit/1f76a9ec739ad20190f2e1b469e1464c7cb26cc6#diff-548844599138ebf79915b9d8e9e59031R261

Yes, now Ganesha is able to mount the CephFS subtree using an auth ID (and secret key) whose MDS caps are restricted to access only the subtree.

1. User 'alice' with MDS caps restricted to a subtree `/volumes/_nogroup/b86384b2-c52c-4607-bcaf-ff294acf1b97`

$ sudo ceph auth get client.alice
exported keyring for client.alice
[client.alice]
	key = AQCj1i5YOO/gOhAADVpTk6XTXNoERTpvPyUfqQ==
	caps mds = "allow rw path=/volumes/_nogroup/b86384b2-c52c-4607-bcaf-ff294acf1b97"
	caps mon = "allow r"
	caps osd = "allow rw pool=cephfs_data namespace=fsvolumens_b86384b2-c52c-4607-bcaf-ff294acf1b97"


2. Set ganesha.conf to export the subtree that was mounted by the FSAL_CEPH using ceph user 'alice'.

$ cat /etc/ganesha/ganesha.conf
EXPORT
{
        Export_ID=100;

        Path = /volumes/_nogroup/b86384b2-c52c-4607-bcaf-ff294acf1b97;

        Pseudo = /volumes/_nogroup/b86384b2-c52c-4607-bcaf-ff294acf1b97;

        Protocols = 4;

        Transports = TCP;

        FSAL {
                Name = CEPH;
                User_Id = "alice";
                Secret_Access_Key = "AQCj1i5YOO/gOhAADVpTk6XTXNoERTpvPyUfqQ==";
        }

        CLIENT {
                Clients = 10.0.2.15;
                Access_Type = RW;
        }

}


3. Starting the ganesha server created an export. And I could see that subtree was mounted using user 'alice' by listing the Ceph MDS's client sessions.
$ sudo ceph daemon mds.a session ls
[
    {
        "id": 64103,
        "num_leases": 0,
        "num_caps": 4,
        "state": "open",
        "replay_requests": 0,
        "completed_requests": 2,
        "reconnecting": false,
        "inst": "client.64103 10.0.2.15:0\/2287299420",
        "client_metadata": {
            "ceph_sha1": "c461ee19ecbc0c5c330aca20f7392c9a00730367",
            "ceph_version": "ceph version 10.2.4-3-gc461ee1 (c461ee19ecbc0c5c330aca20f7392c9a00730367)",
            "entity_id": "alice",
            "hostname": "osboxes",
            "root": "\/volumes\/_nogroup\/b86384b2-c52c-4607-bcaf-ff294acf1b97"
        }
    }
]

Comment 17 Jeff Layton 2016-12-15 12:06:41 UTC

Great! I've got a PR up with the fix for ceph, but I think keeping the workaround in ganesha for a while is a reasonable thing to do as well.

Now that we have a PoC that works, we should discuss whether this design makes sense. The ceph docs sort of indicate that managing keys directly is a bad idea and that we should use keyring files.

I'm not 100% convinced there, but we could replace the key option with a global CEPH section config option to point ganesha at a keyring file instead. Would that be simpler for the manila integration work?

IOW, I'd like your opinion on where you think the keys should ideally be stored.

Comment 18 Ram Raja 2016-12-16 15:34:15 UTC

> Great! I've got a PR up with the fix for ceph, but I think keeping the workaround in ganesha for a while is a reasonable thing to do as well.

Makes sense. 


> Now that we have a PoC that works, we should discuss whether this design makes sense. The ceph docs sort of indicate that managing keys directly is a bad idea and that we should use keyring files.

Yeah, we need to check whether this a bad idea for our use case. FSAL_RGW allows passing of a secret_access_key,
https://github.com/nfs-ganesha/nfs-ganesha/commit/674f265c#diff-1902eb57168c2420d3657a4cd7052e37R19
so it might be OK for FSAL_CEPH to do so too?


> I'm not 100% convinced there, but we could replace the key option with a global CEPH section config option to point ganesha at a keyring file instead. Would that be simpler for the manila integration work?


My thoughts on Manila/Ganesha/CephFS integration:

When a Manila user requests for IP access to a CephFS subdir:

* Manila's Ganesha driver using `ceph_volume_client.py` would create a cephx user (if it does not already exist) with path restricted MDS caps, and fetch the user's secret key. This is the per-share cephx user and secret key that FSAL_CEPH would use to mount the Ceph subtree.

* Then the ganesha driver would construct a  export block like,
EXPORT
{
        Export_ID=100;

        Path = $cephfs_subdir_path;

        Pseudo = $cephfs_subdir_path;

        Protocols = 4;

        Transports = TCP;

        FSAL {
                Name = CEPH;
                User_Id = "<manila_share_uuid>;
                Secret_Access_Key = "<secret-key>";
        }

        CLIENT {
                Clients = <ip>
                Access_Type = RW;
        }

}
to allow IP access. The export block file would be written to disk for persistence across Ganesha server restarts, and the export would be dynamically added via DBUS.


* For subsequent IP access rule changes of the share, the Ganesha driver would manipulate only the CLIENT sublock of the export block, write the changes back to the export block file, and then dynamically update the export via DBUS.


Hope this description helps.

Comment 19 Jeff Layton 2016-12-16 15:42:12 UTC

I'll leave that up to you...I don't know much about the environment where this thing runs though.

AIUI, cephx keyring files are a lot like krb5 keytabs. The main reason for using keyring files would be that you could probably keep permissions on a keyring file locked a little more tightly than the ganesha config file.

Managing the keys in the ganesha config file should be ok, but you'll need to keep read permissions on ganesha.conf locked down to protect the keys (depending on who has access to the host, of course).

Comment 20 Jeff Layton 2016-12-19 12:53:55 UTC

Frank has merged the patchset into the ganesha next branch, so it should be good to go in v2.5.