Bug 1575023

Summary: manila-share fails initialization ceph-ansible > 3.1.0-0.1.beta4
Product: Red Hat OpenStack Reporter: Tom Barron <tbarron>
Component: openstack-tripleo-heat-templatesAssignee: Giulio Fidente <gfidente>
Status: CLOSED ERRATA QA Contact: Yogev Rabl <yrabl>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: adeza, aherr, aschoen, ceph-eng-bugs, dmacpher, gfidente, gmeno, jamsmith, jschluet, m.andre, mburns, nthomas, pgrist, sankarshan, scohen
Target Milestone: rcKeywords: Triaged
Target Release: 13.0 (Queens)Flags: scohen: needinfo+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.0.2-15.el7ost Doc Type: Known Issue
Doc Text:
The manila-share service fails to initialize because changes to ceph-ansible's complex ceph-keys processing generate incorrect content in the /etc/ceph/ceph.client.manila.keyring file. To allow the manila-share service to initialize: 1) Make a copy of /usr/share/openstack/tripleo-heat-templates to use for the overcloud deploy. 2) Edit the .../tripleo-heat-templates/docker/services/ceph-ansible/ceph-base.yaml file to change all triple backslashes in line 295 to single backslashes. Before: mon_cap: 'allow r, allow command \\\"auth del\\\", allow command \\\"auth caps\\\", allow command \\\"auth get\\\", allow command \\\"auth get-or-create\\\"' After: mon_cap: 'allow r, allow command \"auth del\", allow command \"auth caps\", allow command \"auth get\", allow command \"auth get-or-create\"' 3) Deploy the overcloud substituting the path to the copy of tripleo-heat-templates wherever /usr/share/openstack-tripleo-heat templates occurred in your original overcloud-deploy command. The ceph key /etc/ceph/ceph.client.manila.keyring file will have proper contents and the manila-share service will initialize properly.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:55:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1469208, 1489934, 1489938    

Description Tom Barron 2018-05-04 14:49:59 UTC
Description of problem:

manila-share service fails initialization with:

2018-05-02 19:52:48.288 45 DEBUG manila.share.manager [req-bc4bfb71-c3a6-498c-8d57-097b5dbc32b6 - - - - -] Start initialization of driver: 'CephFSDriver@hostgroup@cephfs' _driver_setup /usr/lib/python2.7/site-packages/manila/share/manager.py:303
2018-05-02 19:52:48.290 45 INFO manila.share.drivers.cephfs.driver [req-bc4bfb71-c3a6-498c-8d57-097b5dbc32b6 - - - - -] [cephfs}] Ceph client found, connecting...
2018-05-02 19:52:48.290 45 DEBUG ceph_volume_client [req-bc4bfb71-c3a6-498c-8d57-097b5dbc32b6 - - - - -] Connecting to RADOS with config /etc/ceph/ceph.conf... connect /usr/lib/python2.7/site-packages/ceph_volume_client.py:449
2018-05-02 19:52:48.348 45 DEBUG ceph_volume_client [req-bc4bfb71-c3a6-498c-8d57-097b5dbc32b6 - - - - -] Connection to RADOS complete connect /usr/lib/python2.7/site-packages/ceph_volume_client.py:458
2018-05-02 19:52:48.348 45 DEBUG ceph_volume_client [req-bc4bfb71-c3a6-498c-8d57-097b5dbc32b6 - - - - -] Connecting to cephfs... connect /usr/lib/python2.7/site-packages/ceph_volume_client.py:460
2018-05-02 19:52:48.348 45 DEBUG ceph_volume_client [req-bc4bfb71-c3a6-498c-8d57-097b5dbc32b6 - - - - -] CephFS initializing... connect /usr/lib/python2.7/site-packages/ceph_volume_client.py:462
2018-05-02 19:52:48.350 45 DEBUG ceph_volume_client [req-bc4bfb71-c3a6-498c-8d57-097b5dbc32b6 - - - - -] Premount eviction of manila starting connect /usr/lib/python2.7/site-packages/ceph_volume_client.py:465
2018-05-02 19:52:48.351 45 INFO ceph_volume_client [req-bc4bfb71-c3a6-498c-8d57-097b5dbc32b6 - - - - -] evict clients with auth_name=manila
2018-05-02 19:52:48.354 45 ERROR manila.share.manager [req-bc4bfb71-c3a6-498c-8d57-097b5dbc32b6 - - - - -] Error encountered during initialization of driver CephFSDriver@hostgroup@cephfs: Error: access denied
2018-05-02 19:52:48.354 45 ERROR manila.share.manager Traceback (most recent call last):
2018-05-02 19:52:48.354 45 ERROR manila.share.manager   File "/usr/lib/python2.7/site-packages/manila/share/manager.py", line 305, in _driver_setup
2018-05-02 19:52:48.354 45 ERROR manila.share.manager     self.driver.do_setup(ctxt)
2018-05-02 19:52:48.354 45 ERROR manila.share.manager   File "/usr/lib/python2.7/site-packages/manila/share/drivers/cephfs/driver.py", line 125, in do_setup
2018-05-02 19:52:48.354 45 ERROR manila.share.manager     ceph_vol_client=self.volume_client)
2018-05-02 19:52:48.354 45 ERROR manila.share.manager   File "/usr/lib/python2.7/site-packages/manila/share/drivers/cephfs/driver.py", line 195, in volume_client
2018-05-02 19:52:48.354 45 ERROR manila.share.manager     self._volume_client.connect(premount_evict=premount_evict)
2018-05-02 19:52:48.354 45 ERROR manila.share.manager   File "/usr/lib/python2.7/site-packages/ceph_volume_client.py", line 466, in connect
2018-05-02 19:52:48.354 45 ERROR manila.share.manager     self.evict(premount_evict)
2018-05-02 19:52:48.354 45 ERROR manila.share.manager   File "/usr/lib/python2.7/site-packages/ceph_volume_client.py", line 391, in evict
2018-05-02 19:52:48.354 45 ERROR manila.share.manager     mds_map = self._rados_command("mds dump", {})
2018-05-02 19:52:48.354 45 ERROR manila.share.manager   File "/usr/lib/python2.7/site-packages/ceph_volume_client.py", line 1277, in _rados_command
2018-05-02 19:52:48.354 45 ERROR manila.share.manager     raise rados.Error(outs)
2018-05-02 19:52:48.354 45 ERROR manila.share.manager Error: access denied
2018-05-02 19:52:48.354 45 ERROR manila.share.manager
Version-Release number of selected component (if applicable):


How reproducible:

We see the issue with our only 2018-04-26.3 puddle deployment.
Did not see the issue with 2018-04-11 ceph_pending_latest puddle.

Steps to Reproduce:
1. Deploy manila with ceph-nfs back end.
2. Observe the /var/log/containers/manila/manila-share log on the controller
   on which manila-share is running.

Actual results:

Share service initialization fails with the above log messages.

Expected results:

Initialization succeeds without tracebacks.

Additional info:

Comment 1 Tom Barron 2018-05-04 15:03:51 UTC
Permissions for the ceph keyfiles in the manila-share container may be the issue.

On ceph-daemon based containers the keys look like this:

[root controller-1 ~]# docker exec ceph-nfs-pacemaker ls -ld /etc/ceph
drwxr-xr-x. 2 ceph ceph 200 May  2 19:22 /etc/ceph
[root controller-1 ~]# docker exec ceph-nfs-pacemaker ls -l /etc/ceph
total 28
-rw-------. 1 root root 159 May  2 19:20 ceph.client.admin.keyring
-rw-------. 1 ceph ceph 292 May  2 19:22 ceph.client.manila.keyring
-rw-------. 1 ceph ceph 299 May  2 19:22 ceph.client.openstack.keyring
-rw-------. 1 ceph ceph 149 May  2 19:22 ceph.client.radosgw.keyring
-rw-r--r--. 1 root root 931 May  2 19:19 ceph.conf
-rw-------. 1 ceph ceph 688 May  2 19:20 ceph.mon.keyring
-rw-r--r--. 1 root root  92 Apr  6 04:17 rbdmap

whereas in the manila-share container:

[root controller-1 ~]# docker exec openstack-manila-share-docker-0 ls -ld /etc/ceph
drwxr-xr-x. 2 167 167 200 May  2 19:22 /etc/ceph
[root controller-1 ~]# docker exec openstack-manila-share-docker-0 ls -l /etc/ceph
total 28
-rw-------. 1 root root 159 May  2 19:20 ceph.client.admin.keyring
-rw-------. 1  167  167 292 May  2 19:22 ceph.client.manila.keyring
-rw-------. 1  167  167 299 May  2 19:22 ceph.client.openstack.keyring
-rw-------. 1  167  167 149 May  2 19:22 ceph.client.radosgw.keyring
-rw-r--r--. 1 root root 931 May  2 19:19 ceph.conf
-rw-------. 1  167  167 688 May  2 19:20 ceph.mon.keyring
-rw-r--r--. 1 root root  92 Apr  6 04:17 rbdmap

There is no user with uid/gid 167 inside the manila-share container and ceph user has uid/gid 64045:
[root controller-1 ~]# docker exec openstack-manila-share-docker-0 grep ':167:' /etc/passwd
[root controller-1 ~]# docker exec openstack-manila-share-docker-0 grep ':167:' /etc/group
[root controller-1 ~]# docker exec openstack-manila-share-docker-0 grep ceph /etc/passwd
ceph:x:64045:64045::/home/ceph:/usr/sbin/nologin
[root controller-1 ~]# docker exec openstack-manila-share-docker-0 grep ceph /etc/group
ceph:x:64045:

Whereas in the ceph-daemon image on on the host itself the ceph user has uid/gid 167:
[root controller-1 ~]# docker exec ceph-nfs-pacemaker grep ceph /etc/passwd
ceph:x:167:167:Ceph daemons:/var/lib/ceph:/sbin/nologin
[root controller-1 ~]# docker exec ceph-nfs-pacemaker grep ceph /etc/group
ceph:x:167:
[root controller-1 ~]# grep ceph /etc/passwd
ceph:x:167:167:Ceph daemons:/var/lib/ceph:/sbin/nologin
[root controller-1 ~]# grep ceph /etc/group
ceph:x:167:

Comment 2 Tom Barron 2018-05-05 11:46:48 UTC
The file ownership and permissions differences between the two puddles actually don't matter since the manila-share container runs as root and can read all the keyfiles.  The issue is reproducible on the host, outside the container:

[root@controller-0 ~]# ceph -n client.manila --keyring=/etc/ceph/ceph.client.manila.keyring mds dump
Error EACCES: access denied

and is due to a bad manila client keyring:

[root@controller-0 ~]# cat /etc/ceph/ceph.client.manila.keyring
[client.manila]
        key = AQDSe+daAAAAABAAQ+8L/490ZS8AQefbKWwYdg==
        caps mds = "allow *"
        caps mgr = "allow *"
        caps mon = "allow r, allow command \\\"auth del\\\", allow command \\\"auth caps\\\", allow command \\\"auth get\\\", allow command \\\"auth get-or-create\\\""
        caps osd = "allow rw"
[root@controller-0 ~]#


I rebuilt the keyring by hand:

[root@controller-1 ceph]# ceph-authtool /etc/ceph/ceph.client.manila.keyring
-n client.manila --cap mds 'allow *' --cap osd 'allow *' --cap mgr 'allow *' 
--cap mon "allow r, allow command 'auth del', allow command 'auth caps', allow 
command 'auth get', allow command 'auth get-or-create'"
[root@controller-1 ceph]# ceph auth import -i
/etc/ceph/ceph.client.manila.keyring
[root@controller-1 ceph]# cat /etc/ceph/ceph.client.manila.keyring
[client.manila]
        key = AQDSe+daAAAAABAAQ+8L/490ZS8AQefbKWwYdg==
        caps mds = "allow *"
        caps mgr = "allow *"
        caps mon = "allow r, allow command 'auth del', allow command 'auth caps', allow command 'auth get', allow command 'auth get-or-create'"
        caps osd = "allow *"

And now 'ceph -n client.manila --keyring=/etc/ceph/ceph.client.manila.keyring
mds dump' works and the manila-share log is showing successful eviction and
driver initialization.

Comment 3 Tom Barron 2018-05-05 11:47:30 UTC
Moving this to ceph-ansible since that is the component that builds the keyrings.

Comment 5 Tom Barron 2018-05-05 11:53:12 UTC
There are over 80 changes in ceph-ansible between the two builds
(the 2018-04-11 build uses 3.1.0-0.1.beta4 and the 2018-04-26.3 build uses
3.1.0-0.1.beta8).  Could this one be the culprit?

tbarron@tbarron ceph-ansible (beta-3.1)$ git show 42481550
commit 424815501a0c6072234a8e1311a0fefeb5bcc222
Author: Sébastien Han <seb>
Date:   Wed Apr 18 15:11:55 2018 +0200

   client: add quotes to the dict values
   ceph-authtool does not support raw arguements so we have to quote caps
   declaration like this allow 'bla bla' instead of allow bla bla
   Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1568157
   Signed-off-by: Sébastien Han <seb>

diff --git a/roles/ceph-client/tasks/create_users_keys.yml b/roles/ceph-client/tasks/create_users_keys.yml
index b5b012e1..36bbcc8e 100644
--- a/roles/ceph-client/tasks/create_users_keys.yml
+++ b/roles/ceph-client/tasks/create_users_keys.yml
@@ -1,7 +1,7 @@
---
- name: set_fact keys_tmp - preserve backward compatibility after the introduction of the ceph_keys module
  set_fact:
-    keys_tmp: "{{ keys_tmp|default([]) + [ { 'key': item.key, 'name': item.name, 'caps': { 'mon': item.mon_cap, 'osd': item.osd_cap|default(''), 'mds': item.mds_cap|default(''), 'mgr':
+item.mgr_cap|default('') } , 'mode': item.mode } ] }}"
+    keys_tmp: "{{ keys_tmp|default([]) + [ { 'key': item.key, 'name': item.name, 'caps': { 'mon': item.mon_cap|quote, 'osd': item.osd_cap|default('')|quote, 'mds':
+item.mds_cap|default('')|quote, 'mgr': item.mgr_cap|default('')|quote } , 'mode': item.mode } ] }}"
  when:
    - item.get('mon_cap', None) # it's enough to assume we are running an old-fashionned syntax simply by checking the presence of mon_cap since every key needs this cap
  with_items: "{{ keys }}"

Comment 6 Giulio Fidente 2018-05-06 09:17:09 UTC
I will try to address this in the templates, seems the place where the fix should really go.

Comment 9 Tom Barron 2018-05-07 20:16:51 UTC
https://code.engineering.redhat.com/gerrit/137943 has merged.

Comment 11 Paul Grist 2018-05-07 20:40:56 UTC
The TL;DR is we need this fix for proper deployment of Manila CephFS-NFS and without it there is a work-around but it will take a THT replacement and OC rebuild, so ideally we would like to get this into beta.

Comment 16 Tom Barron 2018-05-16 22:36:14 UTC
Tested with 2018-05-07.2 puddle and manila-share initializes fine and ceph client eviction has no issues.  

2018-05-11 14:52:16.266 44 DEBUG ceph_volume_client [req-e906cb0f-2d83-4490-8bf7-fe284516a3f2 - - - - -] CephFS initializing... connect /usr/lib/python2.7/site-packages/ceph_volume_client.py:462
2018-05-11 14:52:16.268 44 DEBUG ceph_volume_client [req-e906cb0f-2d83-4490-8bf7-fe284516a3f2 - - - - -] Premount eviction of manila starting connect /usr/lib/python2.7/site-packages/ceph_volume_client.py:465
2018-05-11 14:52:16.269 44 INFO ceph_volume_client [req-e906cb0f-2d83-4490-8bf7-fe284516a3f2 - - - - -] evict clients with auth_name=manila
2018-05-11 14:52:16.274 44 DEBUG ceph_volume_client [-] _ready_to_evict: state=up:active _ready_to_evict /usr/lib/python2.7/site-packages/ceph_volume_client.py:125
2018-05-11 14:52:16.275 44 DEBUG ceph_volume_client [-] mds_command: 4260, ['session', 'evict', 'auth_name=manila'] _evict /usr/lib/python2.7/site-packages/ceph_volume_client.py:157
2018-05-11 14:52:16.705 44 DEBUG ceph_volume_client [-] mds_command: complete 0  _evict /usr/lib/python2.7/site-packages/ceph_volume_client.py:165
2018-05-11 14:52:16.706 44 DEBUG ceph_volume_client [-] _ready_to_evict: state=up:active _ready_to_evict /usr/lib/python2.7/site-packages/ceph_volume_client.py:125
2018-05-11 14:52:16.707 44 DEBUG ceph_volume_client [-] mds_command: 4271, ['session', 'evict', 'auth_name=manila'] _evict /usr/lib/python2.7/site-packages/ceph_volume_client.py:157
2018-05-11 14:52:16.709 44 DEBUG ceph_volume_client [-] mds_command: complete 0  _evict /usr/lib/python2.7/site-packages/ceph_volume_client.py:165
2018-05-11 14:52:16.712 44 DEBUG ceph_volume_client [-] _ready_to_evict: state=up:active _ready_to_evict /usr/lib/python2.7/site-packages/ceph_volume_client.py:125
2018-05-11 14:52:16.712 44 DEBUG ceph_volume_client [-] mds_command: 4263, ['session', 'evict', 'auth_name=manila'] _evict /usr/lib/python2.7/site-packages/ceph_volume_client.py:157
2018-05-11 14:52:16.719 44 DEBUG ceph_volume_client [-] mds_command: complete 0  _evict /usr/lib/python2.7/site-packages/ceph_volume_client.py:165
2018-05-11 14:52:16.719 44 INFO ceph_volume_client [req-e906cb0f-2d83-4490-8bf7-fe284516a3f2 - - - - -] evict: joined all
2018-05-11 14:52:16.720 44 DEBUG ceph_volume_client [req-e906cb0f-2d83-4490-8bf7-fe284516a3f2 - - - - -] Premount eviction of manila completes connect /usr/lib/python2.7/site-packages/ceph_volume_client.py:467
2018-05-11 14:52:16.720 44 DEBUG ceph_volume_client [req-e906cb0f-2d83-4490-8bf7-fe284516a3f2 - - - - -] CephFS mounting... connect /usr/lib/python2.7/site-packages/ceph_volume_client.py:468
2018-05-11 14:52:16.731 44 DEBUG ceph_volume_client [req-e906cb0f-2d83-4490-8bf7-fe284516a3f2 - - - - -] Connection to cephfs complete connect /usr/lib/python2.7/site-packages/ceph_volume_client.py:470
2018-05-11 14:52:16.732 44 DEBUG ceph_volume_client [req-e906cb0f-2d83-4490-8bf7-fe284516a3f2 - - - - -] Recovering from partial auth updates (if any)... recover /usr/lib/python2.7/site-packages/ceph_volume_client.py:265
2018-05-11 14:52:16.732 44 DEBUG ceph_volume_client [req-e906cb0f-2d83-4490-8bf7-fe284516a3f2 - - - - -] Nothing to recover. No auth meta files. recover /usr/lib/python2.7/site-packages/ceph_volume_client.py:270
2018-05-11 14:52:16.733 44 INFO manila.share.drivers.cephfs.driver [req-e906cb0f-2d83-4490-8bf7-fe284516a3f2 - - - - -] [cephfs] Ceph client connection complete.


manila client keyring is not over-escaped:

()[root@controller-1 /]# cat /etc/ceph/ceph.client.manila.keyring 
[client.manila]
	key = AQCtf/RaAAAAABAA2pG5oT2Kv93P/0hu9z105g==
	caps mds = "allow *"
	caps mgr = "allow *"
	caps mon = "allow r, allow command \"auth del\", allow command \"auth caps\", allow command \"auth get\", allow command \"auth get-or-create\""
	caps osd = "allow rw"
()[root@controller-1 /]# 

If QE is satisfied this one can go to VERIFIED.

Comment 17 Yogev Rabl 2018-05-18 20:03:07 UTC
verified

Comment 19 errata-xmlrpc 2018-06-27 13:55:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086