Description of problem: Trying to upload an image into glance fails with a RBD permission error. Version-Release number of selected component (if applicable): OSP14 2018-06-13.2 $ rpm -qa 'openstack-tripleo*' openstack-tripleo-common-9.0.2-0.20180602091818.fb9a384.el7ost.noarch openstack-tripleo-puppet-elements-9.0.0-0.20180602004307.939b586.el7ost.noarch openstack-tripleo-ui-9.0.1-0.20180523202213.73df625.el7ost.noarch openstack-tripleo-validations-9.0.1-0.20180601011353.9cf1f89.el7ost.noarch openstack-tripleo-image-elements-9.0.0-0.20180601015717.2ac38dd.el7ost.noarch openstack-tripleo-heat-templates-9.0.0-0.20180604091845.el7ost.noarch openstack-tripleo-common-containers-9.0.2-0.20180602091818.fb9a384.el7ost.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy the overcloud with /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml and ceph-ansible enabled. 2. Try to upload an image into Glance. Actual results: # from /var/log/containers/glance/api.log on the controller node 2018-06-14 19:33:46.206 26 INFO eventlet.wsgi.server [req-58741674-ba75-4e0b-8d8f-cf2c95d8514d 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] 192.168.24.18 - - [14/Jun/2018 19:33:46] "POST /v2/images HTTP/1.1" 201 896 0.737768 2018-06-14 19:33:46.348 26 WARNING glance_store._drivers.rbd [req-00e7594b-bb79-4ba2-9d4f-960909dee171 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] since image size is zero we will be doing resize-before-write for each chunk which w ill be considerably slower than normal 2018-06-14 19:33:46.358 26 ERROR glance.api.v2.image_data [req-00e7594b-bb79-4ba2-9d4f-960909dee171 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] Failed to upload image data due to internal error: PermissionError: [errno 1] error cre ating image 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi [req-00e7594b-bb79-4ba2-9d4f-960909dee171 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] Caught error: [errno 1] error creating image: PermissionError: [errno 1] error creating image 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi Traceback (most recent call last): 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance/common/wsgi.py", line 1256, in __call__ 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi request, **action_args) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance/common/wsgi.py", line 1299, in dispatch 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi return method(*args, **kwargs) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance/common/utils.py", line 414, in wrapped 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi return func(self, req, *args, **kwargs) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance/api/v2/image_data.py", line 267, in upload 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi self._restore(image_repo, image) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi self.force_reraise() 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi six.reraise(self.type_, self.value, self.tb) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance/api/v2/image_data.py", line 132, in upload 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi image.set_data(data, size) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance/domain/proxy.py", line 195, in set_data 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi self.base.set_data(data, size) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance/notifier.py", line 480, in set_data 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi _send_notification(notify_error, 'image.upload', msg) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__ 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi self.force_reraise() 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi six.reraise(self.type_, self.value, self.tb) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance/notifier.py", line 427, in set_data 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi self.repo.set_data(data, size) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance/api/policy.py", line 193, in set_data 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi return self.image.set_data(*args, **kwargs) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance/quota/__init__.py", line 304, in set_data 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi self.image.set_data(data, size=size) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance/location.py", line 439, in set_data 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi verifier=verifier) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance_store/backend.py", line 453, in add_to_backend 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi verifier) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance_store/backend.py", line 426, in store_add_to_backend 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi verifier=verifier) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance_store/capabilities.py", line 225, in op_checker 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi return store_op_fun(store, *args, **kwargs) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.py", line 469, in add 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi image_size, order) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "/usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.py", line 363, in _create_image 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi features=int(features)) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi File "rbd.pyx", line 753, in rbd.RBD.create (/builddir/build/BUILD/ceph-12.2.4/build/src/pybind/rbd/pyrex/rbd.c:5010) 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi PermissionError: [errno 1] error creating image 2018-06-14 19:33:46.979 26 ERROR glance.common.wsgi Expected results: The image upload should be successful Additional info: Ceph cluster is healthy : # ceph -s cluster: id: cbcb9660-6ff9-11e8-8778-fa163eff2637 health: HEALTH_OK services: mon: 1 daemons, quorum overcloud-controller-0 mgr: overcloud-controller-0(active) osd: 3 osds: 3 up, 3 in rgw: 1 daemon active data: pools: 8 pools, 512 pgs objects: 187 objects, 1113 bytes usage: 326 MB used, 209 GB / 209 GB avail pgs: 512 active+clean # ceph --version ceph version 12.2.4-6.el7cp (78f60b924802e34d44f7078029a40dbe6c0c922f) luminous (stable) # ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 209G 209G 326M 0.15 POOLS: NAME ID USED %USED MAX AVAIL OBJECTS images 1 0 0 199G 0 backups 2 0 0 199G 0 vms 3 0 0 199G 0 volumes 4 0 0 199G 0 .rgw.root 5 1113 0 199G 4 default.rgw.control 6 0 0 199G 8 default.rgw.meta 7 0 0 199G 0 default.rgw.log 8 0 0 199G 175 But something is weird in the ceph openstack keyring (used by glance) : # cat /etc/ceph/ceph.client.openstack.keyring [client.openstack] key = AQApqCJbAAAAABAA4FZqczmue3pb+TW5DBmjhg== caps mds = "" caps mgr = "allow *" caps mon = "allow r" caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=" As you can see at the end of the osd auth, there's an empty value after pool= This can be related to this commit [1] when telemetry is disable to avoid to create the rbd pool (which doesn't exist) but the associated auth seems to be created with an empty value. I'm not sure if this is the root cause of the RBD permission error. [1] https://github.com/openstack/tripleo-heat-templates/commit/959cb6c5391bae657113ec6f69abe1a7cc277ee5
Alright I can confirm that after updating the client.openstack osd capabilities by removing the extra ", allow rwx pool=" I don't see the error anymore # ceph auth get client.openstack exported keyring for client.openstack [client.openstack] key = AQApqCJbAAAAABAA4FZqczmue3pb+TW5DBmjhg== caps mds = "" caps mgr = "allow *" caps mon = "allow r" caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images" 2018-06-14 20:07:33.562 26 INFO eventlet.wsgi.server [req-2ff67350-6477-4423-a173-ee6a0aba3932 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] 192.168.24.18 - - [14/Jun/2018 20:07:33] "GET /v2/schemas/image HTTP/1.1" 200 4359 0.380123 2018-06-14 20:07:33.715 26 INFO eventlet.wsgi.server [req-512e8040-4adb-406d-9004-ba90e9bc3a22 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] 192.168.24.18 - - [14/Jun/2018 20:07:33] "POST /v2/images HTTP/1.1" 201 896 0.108192 2018-06-14 20:07:33.865 26 WARNING glance_store._drivers.rbd [req-5fcdd139-e407-4e66-b21b-9e8949cf7b55 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] since image size is zero we will be doing resize-before-write for each chunk which will be considerably slower than normal 2018-06-14 20:07:36.332 26 INFO eventlet.wsgi.server [req-5fcdd139-e407-4e66-b21b-9e8949cf7b55 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] 192.168.24.18 - - [14/Jun/2018 20:07:36] "PUT /v2/images/39703453-cc6e-4000-aa07-bdbd8cf001db/file HTTP/1.1" 204 189 2.603067 2018-06-14 20:07:36.363 26 INFO eventlet.wsgi.server [req-c02646b8-5d03-4b1e-8ca0-7a2e6d609cc5 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] 192.168.24.18 - - [14/Jun/2018 20:07:36] "GET /v2/images/39703453-cc6e-4000-aa07-bdbd8cf001db HTTP/1.1" 200 1016 0.024981 2018-06-14 20:07:36.371 26 INFO eventlet.wsgi.server [req-2335288f-8ec1-4137-b964-886e0088322f 411a0e4136c54a8d8ec06e14683b8080 d35387e982e94606993d5eb9b346d432 - default default] 192.168.24.18 - - [14/Jun/2018 20:07:36] "GET /v2/schemas/image HTTP/1.1" 200 4359 0.004027
While looking in the environment where this bug was discovered I see that TripleO set up ceph-ansible's input with an empty pool. cat /var/lib/mistral/b44a3d6f-077c-4481-b13b-a3c1fc34bca2/ceph-ansible/group_vars/all.yml ... openstack_keys: - key: AQApqCJbAAAAABAA4FZqczmue3pb+TW5DBmjhg== mgr_cap: allow * mode: '0600' mon_cap: allow r name: client.openstack osd_cap: allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool= - key: ...
The fix proposed upstream (575571) and included in the OSP14 puddle 2018-07-09.1 (passed_phase1) isn't working. See the upstream bug (1776987) for more informations. $ rpm -qa 'openstack-tripleo*' openstack-tripleo-common-9.1.1-0.20180703132151.648aa43.el7ost.noarch openstack-tripleo-image-elements-9.0.0-0.20180702150810.3094693.el7ost.noarch openstack-tripleo-ui-9.1.1-0.20180702224622.d3d7221.el7ost.noarch openstack-tripleo-heat-templates-9.0.0-0.20180703131156.de62fe3.el7ost.noarch openstack-tripleo-puppet-elements-9.0.0-0.20180602004307.939b586.el7ost.noarch openstack-tripleo-validations-9.1.1-0.20180618123656.d21e7fa.el7ost.noarch openstack-tripleo-common-containers-9.1.1-0.20180703132151.648aa43.el7ost.noarc
The bug is now present in OSP13 (2018-07-30.2) as well. Should I clone this bug for OSP13 too ? Note that there's extras single quotes in that release: $ cat ceph.client.openstack.keyring [client.openstack] key = AQCxZ2JbAAAAABAAWD4QA62TU7eCbYLoOLCjCg== caps mds = "''" caps mgr = "'allow *'" caps mon = "'profile rbd'" caps osd = "'profile rbd pool=volumes, profile rbd pool=backups, profile rbd pool=vms, profile rbd pool=images, profile rbd pool='"
verified on : core_puddle=2018-11-07.2 (undercloud) [stack@undercloud-0 ~]$ rpm -qa 'openstack-tripleo*' openstack-tripleo-validations-9.3.1-0.20181008110747.4064fb7.el7ost.noarch openstack-tripleo-common-9.4.1-0.20181012010870.67bab16.el7ost.noarch openstack-tripleo-common-containers-9.4.1-0.20181012010870.67bab16.el7ost.noarch openstack-tripleo-puppet-elements-9.0.0-0.20181007201103.daf9069.el7ost.noarch openstack-tripleo-image-elements-9.0.1-0.20181007200834.2dc678a.el7ost.noarch openstack-tripleo-heat-templates-9.0.1-0.20181013060873.el7ost.noarch (undercloud) [stack@undercloud-0 ~]$ After a successful deployment with /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml uploaded a cirros image by : (undercloud) [stack@undercloud-0 ~]$ wget http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img (undercloud) [stack@undercloud-0 ~]$ openstack image create --disk-format qcow2 --container-format bare --public --file ./cirros-0.4.0-x86_64-disk.img my_cirros verified success by no error as described in the descriptio: [root@controller-0 ~]# grep -i ERROR /var/log/containers/glance/api.log [root@controller-0 ~]# verified success by image list: (overcloud) [stack@undercloud-0 ~]$ openstack image list +--------------------------------------+----------------------------------+--------+ | ID | Name | Status | +--------------------------------------+----------------------------------+--------+ | 73534420-f191-4e85-b922-a1a34b6545fc | cirros-0.3.5-x86_64-disk.img | active | | 8445b83d-2800-43a5-b77b-e7d0359f2e9a | cirros-0.3.5-x86_64-disk.img_alt | active | | 586c2cd3-6f93-496e-bd25-ef15742497ca | my_cirros | active | +--------------------------------------+----------------------------------+--------+
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:0045