Bug 1493597

Summary: Performing a manila access-allow on an existing auth entry in Ceph corrupts the permissions.
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Darin Sorrentino <dsorrent>
Component: CephFSAssignee: Ram Raja <rraja>
Status: CLOSED ERRATA QA Contact: subhash <vpoliset>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.4CC: anharris, bniver, ceph-eng-bugs, dsorrent, gmeno, pdonnell, rraja, tbarron, tchandra, tserlin, uboppana
Target Milestone: z4Keywords: Triaged, ZStream
Target Release: 2.5Flags: dsorrent: needinfo-
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: RHEL: ceph-10.2.10-46.el7cp Ubuntu: ceph_10.2.10-40redhat1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1543879 (view as bug list) Environment:
Last Closed: 2019-04-11 13:32:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Darin Sorrentino 2017-09-20 14:45:07 UTC
Description of problem:

If you create a user in Ceph and then attempt to grant that user access to a share in Manila, it messes up the auth entry for the user.

If you have a user in Ceph which you've created as such:

[ceph1] # ceph --name=client.manila --keyring=./manila.keyring auth get-or-create client.application3 -o application3.keyring
[ceph1] # ceph auth get client.application3
exported keyring for client.application3
[client.application3]
	key = AQBue8JZgBBrNxAAYGKQkHCL4kDXDcliND6gLw==

And then in Manila you attempt to grant that user access to a share in Manila:

[director] # manila access-allow my-test-share cephx application3

I corrupts the entry in Ceph:

[ceph1] # ceph auth get client.application3
exported keyring for client.application3
[client.application3]
	key = AQBue8JZgBBrNxAAYGKQkHCL4kDXDcliND6gLw==
	caps mds = ",allow rw path=/volumes/_nogroup/45a39860-8add-4f47-85c1-7327318e5730"
	caps mon = "allow r"
	caps osd = ",allow rw pool=cephfs_data namespace=fsvolumens_45a39860-8add-4f47-85c1-7327318e5730"


Notice the entries which now start with a comma before the allow.  I think it assumes since the entry exists in Ceph, the user already has existing permissions to something so it is probably appending the additonal permissions onto an empty string.  That's my guess here.

If you do NOT create the user first in Ceph and allow Manila to create it, everything looks fine:


ceph auth get client.application5
exported keyring for client.application5
[client.application5]
	key = AQBLe8JZGy/IBRAAVB0gjYibNALggVO2WuRhYQ==
	caps mds = "allow rw path=/volumes/_nogroup/a89f2b95-cd06-414b-b532-4218693ccde3,allow rw path=/volumes/_nogroup/45a39860-8add-4f47-85c1-7327318e5730"
	caps mon = "allow r"
	caps osd = "allow rw pool=cephfs_data namespace=fsvolumens_45a39860-8add-4f47-85c1-7327318e5730,allow rw pool=cephfs_data namespace=fsvolumens_a89f2b95-cd06-414b-b532-4218693ccde3"

Because the permissions are corrupt, they become un-parseable and an error is thrown when a client attempts to mount the share.

Version-Release number of selected component (if applicable):

OSP11

How reproducible:

100%

Steps to Reproduce:
1. On ceph, create a user to be used with Manila:
ceph --name=client.manila --keyring=./manila.keyring auth get-or-create client.application3 -o application3.keyring

2. In Manila, create a share and then allow access to the user you just created in Ceph:

manila access-allow my-test-share cephx application3

3. Check the permissions in ceph
ceph auth get client.application3

Actual results:

Any attempt to mount the share fail due to Ceph being incapable of parsing the corrupted permissions.

Expected results:

Mount succeeds.

Additional info:

I've manually changed the permissions using 'ceph auth caps' removing the starting command and then the mount succeeds.

Comment 1 Tom Barron 2017-09-21 13:02:40 UTC
Looks like we may have an issue in the ceph volume client somewhere around https://github.com/ceph/ceph/blob/master/src/pybind/ceph_volume_client.py#L1075

Comment 3 Ram Raja 2017-09-21 13:12:23 UTC
Darin, it looks like a bug in the ceph_volume_client.py, a Ceph client module. Can you please share the version of `python-cephfs` package that you're using?

Comment 4 Ram Raja 2017-09-22 13:37:27 UTC
Filed an upstream Ceph bug to track this issue,
http://tracker.ceph.com/issues/21501

Comment 7 Christina Meno 2017-09-22 17:10:06 UTC
Uday,

This bug got added to 2.4 recently. I don't intend to ship it as part of 2.4-async we are currently engaged in. What do you think?

cheers

Comment 8 Darin Sorrentino 2017-09-22 17:54:21 UTC
(In reply to Ram Raja from comment #3)
> Darin, it looks like a bug in the ceph_volume_client.py, a Ceph client
> module. Can you please share the version of `python-cephfs` package that
> you're using?

It looks like on the Ceph nodes, it is python-cephfs-10.2.5-37.el7cp.x86_64 and on the OSP11 nodes it is python-cephfs-10.2.7-28.el7cp.x86_64.

[stack@darin-undercloud demo]$ ansible osp11 -a 'rpm -qa' | egrep '(SUCCESS|python-ceph)'
 [WARNING]: Consider using yum, dnf or zypper module rather than running rpm
10.9.65.68 | SUCCESS | rc=0 >>
python-cephfs-10.2.5-37.el7cp.x86_64
10.9.64.13 | SUCCESS | rc=0 >>
python-cephfs-10.2.7-28.el7cp.x86_64
10.9.64.14 | SUCCESS | rc=0 >>
python-cephfs-10.2.7-28.el7cp.x86_64
10.9.64.10 | SUCCESS | rc=0 >>
python-cephfs-10.2.7-28.el7cp.x86_64
10.9.65.69 | SUCCESS | rc=0 >>
python-cephfs-10.2.5-37.el7cp.x86_64
10.9.64.12 | SUCCESS | rc=0 >>
python-cephfs-10.2.7-28.el7cp.x86_64
10.9.65.70 | SUCCESS | rc=0 >>
python-cephfs-10.2.5-37.el7cp.x86_64
localhost | SUCCESS | rc=0 >>
[stack@darin-undercloud demo]$ 

I saw the previous comment stating OSP11 ships with 10.2.7-32 however, OSP11 actually ships with 10.2.7-28 in the overcloud-full image and the customer would need to register/update all of the overcloud nodes to get 10.2.7-32.

Comment 9 Ram Raja 2017-09-24 16:56:04 UTC
(In reply to Gregory Meno from comment #7)
> Uday,
> 
> This bug got added to 2.4 recently. I don't intend to ship it as part of
> 2.4-async we are currently engaged in. What do you think?
> 
> cheers

Gregory, to give more context. In OSP 11, OpenStack Manila's CephFS native driver is tech-preview. And the driver uses Ceph's ceph_volume_client module. 

The bug is being fixed in Ceph upstream master,
https://github.com/ceph/ceph/pull/17935

Also, let me re-add Uday to get more info. The need more info flag was accidentally removed in Comment 8.

Comment 17 Patrick Donnelly 2019-01-04 19:35:10 UTC
This seems to have been forgotten about. Sorry about that. The patch is upstream in 10.2.11. Do we still want to backport this to 2.5z4? Or we can close this.

Comment 27 subhash 2019-03-26 15:45:02 UTC
Moving this bug to verified state.Followed steps in https://bugzilla.redhat.com/show_bug.cgi?id=1543879 desc.

[ubuntu@magna009 ~]$ rpm -qa | grep python-ce
python-cephfs-10.2.10-49.el7cp.x86_64
[ubuntu@magna009 ~]$ rpm -qa | grep ceph
ceph-common-10.2.10-49.el7cp.x86_64

Comment 29 errata-xmlrpc 2019-04-11 13:32:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:0747