Bug 1591472 - ceph.client.openstack.keyring created with caps osd for empty pool when using disable-telemetry.yaml env
Summary: ceph.client.openstack.keyring created with caps osd for empty pool when using...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 14.0 (Rocky)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: Upstream M3
: 14.0 (Rocky)
Assignee: Giulio Fidente
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks: 1613474
TreeView+ depends on / blocked
 
Reported: 2018-06-14 19:50 UTC by Dimitri Savineau
Modified: 2019-04-11 21:01 UTC (History)
5 users (show)

Fixed In Version: openstack-tripleo-heat-templates-9.0.0-0.20181001174823.90afd18.0rc2.0rc2.el7ost
Doc Type: Bug Fix
Doc Text:
Use of the disable-telemetry environment file no longer results in an invalid Ceph client keyring.
Clone Of:
: 1613474 (view as bug list)
Environment:
Last Closed: 2019-01-11 11:50:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1776987 0 None None None 2018-06-14 20:59:04 UTC
OpenStack gerrit 585911 0 None MERGED Stop cap granting to empty pool when telemetry disabled 2020-09-15 02:45:09 UTC
OpenStack gerrit 604734 0 None MERGED Stop cap granting to empty pool when telemetry disabled 2020-09-15 02:45:07 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:50:31 UTC

Description Dimitri Savineau 2018-06-14 19:50:24 UTC
Description of problem:
Trying to upload an image into glance fails with a RBD permission error.

Version-Release number of selected component (if applicable):
OSP14 2018-06-13.2

$ rpm -qa 'openstack-tripleo*'
openstack-tripleo-common-9.0.2-0.20180602091818.fb9a384.el7ost.noarch
openstack-tripleo-puppet-elements-9.0.0-0.20180602004307.939b586.el7ost.noarch
openstack-tripleo-ui-9.0.1-0.20180523202213.73df625.el7ost.noarch
openstack-tripleo-validations-9.0.1-0.20180601011353.9cf1f89.el7ost.noarch
openstack-tripleo-image-elements-9.0.0-0.20180601015717.2ac38dd.el7ost.noarch
openstack-tripleo-heat-templates-9.0.0-0.20180604091845.el7ost.noarch
openstack-tripleo-common-containers-9.0.2-0.20180602091818.fb9a384.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy the overcloud with /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml and ceph-ansible enabled.
2. Try to upload an image into Glance.

Actual results:

# from /var/log/containers/glance/api.log on the controller node

2018-06-14 19:33:46.206 26 INFO eventlet.wsgi.server [req-58741674-ba75-4e0b-8d8f-cf2c95d8514d 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] 192.168.24.18 - - [14/Jun/2018 19:33:46] "POST /v2/images HTTP/1.1" 201 896 0.737768
2018-06-14 19:33:46.348 26 WARNING glance_store._drivers.rbd [req-00e7594b-bb79-4ba2-9d4f-960909dee171 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] since image size is zero we will be doing resize-before-write for each chunk which w
ill be considerably slower than normal
2018-06-14 19:33:46.358 26 ERROR glance.api.v2.image_data [req-00e7594b-bb79-4ba2-9d4f-960909dee171 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] Failed to upload image data due to internal error: PermissionError: [errno 1] error cre
ating image
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi [req-00e7594b-bb79-4ba2-9d4f-960909dee171 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] Caught error: [errno 1] error creating image: PermissionError: [errno 1] error creating image
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi Traceback (most recent call last):
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance/common/wsgi.py", line 1256, in __call__
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     request, **action_args)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance/common/wsgi.py", line 1299, in dispatch
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     return method(*args, **kwargs)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance/common/utils.py", line 414, in wrapped
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     return func(self, req, *args, **kwargs)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance/api/v2/image_data.py", line 267, in upload
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     self._restore(image_repo, image)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     self.force_reraise()
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     six.reraise(self.type_, self.value, self.tb)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance/api/v2/image_data.py", line 132, in upload
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     image.set_data(data, size)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance/domain/proxy.py", line 195, in set_data
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     self.base.set_data(data, size)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance/notifier.py", line 480, in set_data
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     _send_notification(notify_error, 'image.upload', msg)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     self.force_reraise()
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     six.reraise(self.type_, self.value, self.tb)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance/notifier.py", line 427, in set_data
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     self.repo.set_data(data, size)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance/api/policy.py", line 193, in set_data
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     return self.image.set_data(*args, **kwargs)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance/quota/__init__.py", line 304, in set_data
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     self.image.set_data(data, size=size)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance/location.py", line 439, in set_data
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     verifier=verifier)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance_store/backend.py", line 453, in add_to_backend
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     verifier)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance_store/backend.py", line 426, in store_add_to_backend
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     verifier=verifier)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance_store/capabilities.py", line 225, in op_checker
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     return store_op_fun(store, *args, **kwargs)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.py", line 469, in add
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     image_size, order)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "/usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.py", line 363, in _create_image
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi     features=int(features))
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi   File "rbd.pyx", line 753, in rbd.RBD.create (/builddir/build/BUILD/ceph-12.2.4/build/src/pybind/rbd/pyrex/rbd.c:5010)
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi PermissionError: [errno 1] error creating image
2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi

Expected results:
The image upload should be successful

Additional info:

Ceph cluster is healthy :
# ceph -s
  cluster:
    id:     cbcb9660-6ff9-11e8-8778-fa163eff2637
    health: HEALTH_OK
 
  services:
    mon: 1 daemons, quorum overcloud-controller-0
    mgr: overcloud-controller-0(active)
    osd: 3 osds: 3 up, 3 in
    rgw: 1 daemon active
 
  data:
    pools:   8 pools, 512 pgs
    objects: 187 objects, 1113 bytes
    usage:   326 MB used, 209 GB / 209 GB avail
    pgs:     512 active+clean

# ceph --version
ceph version 12.2.4-6.el7cp (78f60b924802e34d44f7078029a40dbe6c0c922f) luminous (stable)

# ceph df
GLOBAL:
    SIZE     AVAIL     RAW USED     %RAW USED 
    209G      209G         326M          0.15 
POOLS:
    NAME                    ID     USED     %USED     MAX AVAIL     OBJECTS 
    images                  1         0         0          199G           0 
    backups                 2         0         0          199G           0 
    vms                     3         0         0          199G           0 
    volumes                 4         0         0          199G           0 
    .rgw.root               5      1113         0          199G           4 
    default.rgw.control     6         0         0          199G           8 
    default.rgw.meta        7         0         0          199G           0 
    default.rgw.log         8         0         0          199G         175

But something is weird in the ceph openstack keyring (used by glance) :
# cat /etc/ceph/ceph.client.openstack.keyring 
[client.openstack]
        key = AQApqCJbAAAAABAA4FZqczmue3pb+TW5DBmjhg==
        caps mds = ""
        caps mgr = "allow *"
        caps mon = "allow r"
        caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool="

As you can see at the end of the osd auth, there's an empty value after pool=
This can be related to this commit [1] when telemetry is disable to avoid to create the rbd pool (which doesn't exist) but the associated auth seems to be created with an empty value.
I'm not sure if this is the root cause of the RBD permission error.

[1] https://github.com/openstack/tripleo-heat-templates/commit/959cb6c5391bae657113ec6f69abe1a7cc277ee5

Comment 1 Dimitri Savineau 2018-06-14 20:11:18 UTC
Alright I can confirm that after updating the client.openstack osd capabilities by removing the extra ", allow rwx pool=" I don't see the error anymore

# ceph auth get client.openstack
exported keyring for client.openstack
[client.openstack]
        key = AQApqCJbAAAAABAA4FZqczmue3pb+TW5DBmjhg==
        caps mds = ""
        caps mgr = "allow *"
        caps mon = "allow r"
        caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images"


2018-06-14 20:07:33.562 26 INFO eventlet.wsgi.server [req-2ff67350-6477-4423-a173-ee6a0aba3932 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] 192.168.24.18 - - [14/Jun/2018 20:07:33] "GET /v2/schemas/image HTTP/1.1" 200 4359 0.380123
2018-06-14 20:07:33.715 26 INFO eventlet.wsgi.server [req-512e8040-4adb-406d-9004-ba90e9bc3a22 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] 192.168.24.18 - - [14/Jun/2018 20:07:33] "POST /v2/images HTTP/1.1" 201 896 0.108192
2018-06-14 20:07:33.865 26 WARNING glance_store._drivers.rbd [req-5fcdd139-e407-4e66-b21b-9e8949cf7b55 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] since image size is zero we will be doing resize-before-write for each chunk which will be considerably slower than normal
2018-06-14 20:07:36.332 26 INFO eventlet.wsgi.server [req-5fcdd139-e407-4e66-b21b-9e8949cf7b55 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] 192.168.24.18 - - [14/Jun/2018 20:07:36] "PUT /v2/images/39703453-cc6e-4000-aa07-bdbd8cf001db/file HTTP/1.1" 204 189 2.603067
2018-06-14 20:07:36.363 26 INFO eventlet.wsgi.server [req-c02646b8-5d03-4b1e-8ca0-7a2e6d609cc5 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] 192.168.24.18 - - [14/Jun/2018 20:07:36] "GET /v2/images/39703453-cc6e-4000-aa07-bdbd8cf001db HTTP/1.1" 200 1016 0.024981
2018-06-14 20:07:36.371 26 INFO eventlet.wsgi.server [req-2335288f-8ec1-4137-b964-886e0088322f 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] 192.168.24.18 - - [14/Jun/2018 20:07:36] "GET /v2/schemas/image HTTP/1.1" 200 4359 0.004027

Comment 2 John Fulton 2018-06-14 21:03:20 UTC
While looking in the environment where this bug was discovered I see that TripleO set up ceph-ansible's input with an empty pool.

cat /var/lib/mistral/b44a3d6f-077c-4481-b13b-a3c1fc34bca2/ceph-ansible/group_vars/all.yml
...

openstack_keys:
-   key: AQApqCJbAAAAABAA4FZqczmue3pb+TW5DBmjhg==
    mgr_cap: allow *
    mode: '0600'
    mon_cap: allow r
    name: client.openstack
    osd_cap: allow class-read object_prefix rbd_children, allow rwx pool=volumes,
        allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx
        pool=
-   key:

...

Comment 3 Dimitri Savineau 2018-07-16 17:39:17 UTC
The fix proposed upstream (575571) and included in the OSP14 puddle 2018-07-09.1 (passed_phase1) isn't working.
See the upstream bug (1776987) for more informations.

$ rpm -qa 'openstack-tripleo*'
openstack-tripleo-common-9.1.1-0.20180703132151.648aa43.el7ost.noarch
openstack-tripleo-image-elements-9.0.0-0.20180702150810.3094693.el7ost.noarch
openstack-tripleo-ui-9.1.1-0.20180702224622.d3d7221.el7ost.noarch
openstack-tripleo-heat-templates-9.0.0-0.20180703131156.de62fe3.el7ost.noarch
openstack-tripleo-puppet-elements-9.0.0-0.20180602004307.939b586.el7ost.noarch
openstack-tripleo-validations-9.1.1-0.20180618123656.d21e7fa.el7ost.noarch
openstack-tripleo-common-containers-9.1.1-0.20180703132151.648aa43.el7ost.noarc

Comment 5 Dimitri Savineau 2018-08-02 15:28:18 UTC
The bug is now present in OSP13 (2018-07-30.2) as well.

Should I clone this bug for OSP13 too ?

Note that there's extras single quotes in that release:

$ cat ceph.client.openstack.keyring 
[client.openstack]
	key = AQCxZ2JbAAAAABAAWD4QA62TU7eCbYLoOLCjCg==
	caps mds = "''"
	caps mgr = "'allow *'"
	caps mon = "'profile rbd'"
	caps osd = "'profile rbd pool=volumes, profile rbd pool=backups, profile rbd pool=vms, profile rbd pool=images, profile rbd pool='"

Comment 9 Gal Amado 2018-11-11 16:40:46 UTC
verified on : core_puddle=2018-11-07.2

(undercloud) [stack@undercloud-0 ~]$ rpm -qa 'openstack-tripleo*'
openstack-tripleo-validations-9.3.1-0.20181008110747.4064fb7.el7ost.noarch
openstack-tripleo-common-9.4.1-0.20181012010870.67bab16.el7ost.noarch
openstack-tripleo-common-containers-9.4.1-0.20181012010870.67bab16.el7ost.noarch
openstack-tripleo-puppet-elements-9.0.0-0.20181007201103.daf9069.el7ost.noarch
openstack-tripleo-image-elements-9.0.1-0.20181007200834.2dc678a.el7ost.noarch
openstack-tripleo-heat-templates-9.0.1-0.20181013060873.el7ost.noarch
(undercloud) [stack@undercloud-0 ~]$ 




After a successful deployment with /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml 

uploaded a cirros image by :
(undercloud) [stack@undercloud-0 ~]$ wget http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img
(undercloud) [stack@undercloud-0 ~]$ openstack image create --disk-format qcow2 --container-format bare   --public --file ./cirros-0.4.0-x86_64-disk.img my_cirros

verified success by no error as described in the descriptio: 
[root@controller-0 ~]# grep -i ERROR /var/log/containers/glance/api.log 
[root@controller-0 ~]# 

verified success by image list:
(overcloud) [stack@undercloud-0 ~]$ openstack image list
+--------------------------------------+----------------------------------+--------+
| ID                                   | Name                             | Status |
+--------------------------------------+----------------------------------+--------+
| 73534420-f191-4e85-b922-a1a34b6545fc | cirros-0.3.5-x86_64-disk.img     | active |
| 8445b83d-2800-43a5-b77b-e7d0359f2e9a | cirros-0.3.5-x86_64-disk.img_alt | active |
| 586c2cd3-6f93-496e-bd25-ef15742497ca | my_cirros                        | active |
+--------------------------------------+----------------------------------+--------+

Comment 12 errata-xmlrpc 2019-01-11 11:50:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.