Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1570007

Summary: glance has issue with communicating with ceph after fresh install
Product: Red Hat OpenStack Reporter: Attila Fazekas <afazekas>
Component: puppet-tripleoAssignee: RHOS Maint <rhos-maint>
Status: CLOSED NOTABUG QA Contact: nlevinki <nlevinki>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 10.0 (Newton)CC: akekane, eglynn, fhubik, fpercoco, jjoyce, johfulto, jschluet, slinaber, srevivo, tvignaud
Target Milestone: zstreamKeywords: Regression, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-25 03:20:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Attila Fazekas 2018-04-20 12:37:00 UTC
Description of problem:

We are unable upload images:
var/log/glance/api.log:
2018-04-19 19:31:32.865 139939 DEBUG glance_store.location [req-dfb1bf42-2abe-45b3-9c1f-e4279668119c e52a82a36b8243778df0257afda656bf acd27d64a1a541f1be0fb8619ef2ce52 - default default] Registering scheme rbd with {'location_class': <class 'glance_store._drivers.rbd.StoreLocation'>, 'store': <glance_store._drivers.rbd.Store object at 0x7f5cd587e250>, 'store_entry': 'glance.store.rbd.Store'} register_scheme_map /usr/lib/python2.7/site-packages/glance_store/location.py:88
2018-04-19 19:31:32.899 139939 ERROR glance_store._drivers.rbd [req-dfb1bf42-2abe-45b3-9c1f-e4279668119c e52a82a36b8243778df0257afda656bf acd27d64a1a541f1be0fb8619ef2ce52 - default default] Error connecting to ceph cluster.
2018-04-19 19:31:32.899 139939 ERROR glance_store._drivers.rbd Traceback (most recent call last):
2018-04-19 19:31:32.899 139939 ERROR glance_store._drivers.rbd   File "/usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.py", line 273, in get_connection
2018-04-19 19:31:32.899 139939 ERROR glance_store._drivers.rbd     client.connect(timeout=self.connect_timeout)
2018-04-19 19:31:32.899 139939 ERROR glance_store._drivers.rbd   File "rados.pyx", line 785, in rados.Rados.connect (rados.c:8969)
2018-04-19 19:31:32.899 139939 ERROR glance_store._drivers.rbd ObjectNotFound: error connecting to the cluster
2018-04-19 19:31:32.899 139939 ERROR glance_store._drivers.rbd 
2018-04-19 19:31:32.901 139939 DEBUG oslo_messaging._drivers.amqpdriver [req-dfb1bf42-2abe-45b3-9c1f-e4279668119c e52a82a36b8243778df0257afda656bf acd27d64a1a541f1be0fb8619ef2ce52 - default default] CAST unique_id: fbddddb9c31e417aadbc8b531be102e2 NOTIFY exchange 'glance' topic 'notifications.error' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:552
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data [req-dfb1bf42-2abe-45b3-9c1f-e4279668119c e52a82a36b8243778df0257afda656bf acd27d64a1a541f1be0fb8619ef2ce52 - default default] Failed to upload image data due to internal error
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data Traceback (most recent call last):
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance/api/v2/image_data.py", line 115, in upload
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     image.set_data(data, size)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance/domain/proxy.py", line 195, in set_data
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     self.base.set_data(data, size)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance/notifier.py", line 479, in set_data
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     _send_notification(notify_error, 'image.upload', msg)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     self.force_reraise()
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     six.reraise(self.type_, self.value, self.tb)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance/notifier.py", line 426, in set_data
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     self.repo.set_data(data, size)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance/api/policy.py", line 187, in set_data
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     return self.image.set_data(*args, **kwargs)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance/quota/__init__.py", line 304, in set_data
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     self.image.set_data(data, size=size)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance/location.py", line 440, in set_data
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     verifier=verifier)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance_store/backend.py", line 452, in add_to_backend
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     verifier)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance_store/backend.py", line 425, in store_add_to_backend
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     verifier=verifier)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance_store/capabilities.py", line 225, in op_checker
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     return store_op_fun(store, *args, **kwargs)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.py", line 447, in add
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     rados_id=self.user) as conn:
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     return self.gen.next()
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.py", line 277, in get_connection
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     raise exceptions.BackendException()
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data BackendException
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data 


However the ceph healt looks ok.

Version-Release number of selected component (if applicable):
ceph-base.x86_64                    2:10.2.10-17.el7cp 
puppet-ceph.noarch                  2.4.1-2.el7ost
python-rados.x86_64                 2:10.2.10-17.el7cp
puppet-tripleo.noarch               5.6.4-3.el7ost
python-glance.noarch                1:13.0.0-2.el7ost

How reproducible:
always

Comment 2 Filip HubĂ­k 2018-04-20 13:12:34 UTC
*** Bug 1570026 has been marked as a duplicate of this bug. ***

Comment 4 John Fulton 2018-04-20 19:01:41 UTC
The openstack.keyring created by puppet-ceph for use by glance doesn't seem to have adequate privileges.

[root@controller-0 ~]# rbd -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring -p images ls
[root@controller-0 ~]# rbd -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.openstack.keyring -p images ls
rbd: couldn't connect to the cluster!
2018-04-20 18:56:59.486026 7f27c4687d80  0 librados: client.admin authentication error (22) Invalid argument
[root@controller-0 ~]#

Comment 7 John Fulton 2018-04-20 20:04:42 UTC
ceph tell mon.0 injectargs --debug-auth 20/20

rbd -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.openstack.keyring -p images ls
==> ceph-mon.controller-0.log <==
2018-04-20 19:56:58.987116 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-20 19:56:59.910703 7f296d2b8700 10 In get_auth_session_handler for protocol 0
2018-04-20 19:56:59.911271 7f29700c9700 10 cephx server client.admin: start_session server_challenge 63e5a7ace587694d
2018-04-20 19:56:59.911568 7f29700c9700 10 cephx server client.admin: handle_request get_auth_session_key for client.admin
2018-04-20 19:56:59.911613 7f29700c9700  0 mon.controller-0@0(leader).auth v13 caught error when trying to handle auth request, probably malformed request
2018-04-20 19:57:03.987244 7f29708ca700 10 cephx keyserver: _check_rotating_secrets

-----------------

rbd -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring -p images ls
==> ceph-mon.controller-0.log <==
2018-04-20 20:00:23.755990 7f296d2b8700 10 In get_auth_session_handler for protocol 0
2018-04-20 20:00:23.756829 7f29700c9700 10 cephx server client.admin: start_session server_challenge d7ebe7ac2388f3e0
2018-04-20 20:00:23.758262 7f29700c9700 10 cephx server client.admin: handle_request get_auth_session_key for client.admin
2018-04-20 20:00:23.758321 7f29700c9700 20 cephx server client.admin:  checking key: req.key=3cf507a4ff9086f6 expected_key=3cf507a4ff9086f6
2018-04-20 20:00:23.758360 7f29700c9700 10 cephx: build_service_ticket_reply encoding 1 tickets with secret AQBtydlaAAAAABAA0ddLdyQL84ZrfOcOqSfUCA==
2018-04-20 20:00:23.758375 7f29700c9700 10 cephx: build_service_ticket service auth secret_id 2 ticket_info.ticket.name=client.admin
2018-04-20 20:00:23.758386 7f29700c9700 10 cephx keyserverdata: get_caps: name=client.admin
2018-04-20 20:00:23.758387 7f29700c9700 10 cephx keyserverdata: get_secret: num of caps=3
2018-04-20 20:00:23.758801 7f29700c9700 10 cephx server client.admin: handle_request get_principal_session_key
2018-04-20 20:00:23.758803 7f29700c9700 10 cephx: verify_authorizer decrypted service auth secret_id=2
2018-04-20 20:00:23.758827 7f29700c9700 10 cephx: verify_authorizer global_id=4225
2018-04-20 20:00:23.758836 7f29700c9700 10 cephx: verify_authorizer ok nonce 6b8b4567327b23c6 reply_bl.length()=36
2018-04-20 20:00:23.758838 7f29700c9700 10 cephx server client.admin:  ticket_req.keys = 5
2018-04-20 20:00:23.758840 7f29700c9700 10 cephx server client.admin:  adding key for service mon
2018-04-20 20:00:23.758862 7f29700c9700 10 cephx server client.admin:  adding key for service osd
2018-04-20 20:00:23.758877 7f29700c9700 10 cephx keyserverdata: get_caps: name=client.admin
2018-04-20 20:00:23.758878 7f29700c9700 10 cephx keyserverdata: get_secret: num of caps=3
2018-04-20 20:00:23.758882 7f29700c9700 10 cephx: build_service_ticket_reply encoding 2 tickets with secret AQDXRtpaw1YzLRAA0/+G4CM2Ng8qqtdQ08pHMA==
2018-04-20 20:00:23.758892 7f29700c9700 10 cephx: build_service_ticket service mon secret_id 9 ticket_info.ticket.name=client.admin
2018-04-20 20:00:23.758903 7f29700c9700 10 cephx: build_service_ticket service osd secret_id 9 ticket_info.ticket.name=client.admin
2018-04-20 20:00:23.993846 7f29708ca700 10 cephx keyserver: _check_rotating_secrets

Comment 9 John Fulton 2018-04-24 12:49:40 UTC
I was wrong in comment #7. My request was malformed. Here is the same key that was created working:

[root@controller-0 ceph]# rbd -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.openstack.keyring -p images ls --id openstack
foo
[root@controller-0 ceph]# 

[root@controller-0 ceph]# rbd -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.openstack.keyring -p images ls -n client.openstack
foo
[root@controller-0 ceph]#

==> ceph-mon.controller-0.log <==
2018-04-24 12:48:26.406232 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:48:31.406369 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:48:36.406495 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:48:39.168739 7f29708ca700  0 mon.controller-0@0(leader).data_health(3) update_stats avail 76% total 40947 MB, used 9793 MB, avail 31153 MB
2018-04-24 12:48:41.406701 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:48:46.406859 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:48:51.407040 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:48:56.407211 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:49:01.407375 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:49:06.407520 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:49:11.407698 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:49:16.407820 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:49:16.971214 7f296d2b8700 10 In get_auth_session_handler for protocol 0
2018-04-24 12:49:16.971887 7f29700c9700 10 cephx server client.openstack: start_session server_challenge 9562e3f8b3dfdb21
2018-04-24 12:49:16.972241 7f29700c9700 10 cephx server client.openstack: handle_request get_auth_session_key for client.openstack
2018-04-24 12:49:16.972264 7f29700c9700 20 cephx server client.openstack:  checking key: req.key=a2bb462b5222e920 expected_key=a2bb462b5222e920
2018-04-24 12:49:16.972325 7f29700c9700 10 cephx: build_service_ticket_reply encoding 1 tickets with secret AQBtydlaAAAAABAAQ6qzhegXg+GXw7WAnaFFDA==
2018-04-24 12:49:16.972346 7f29700c9700 10 cephx: build_service_ticket service auth secret_id 9 ticket_info.ticket.name=client.openstack
2018-04-24 12:49:16.972355 7f29700c9700 10 cephx keyserverdata: get_caps: name=client.openstack
2018-04-24 12:49:16.972356 7f29700c9700 10 cephx keyserverdata: get_secret: num of caps=2
2018-04-24 12:49:16.972919 7f29700c9700 10 cephx server client.openstack: handle_request get_principal_session_key
2018-04-24 12:49:16.972921 7f29700c9700 10 cephx: verify_authorizer decrypted service auth secret_id=9
2018-04-24 12:49:16.972945 7f29700c9700 10 cephx: verify_authorizer global_id=4234
2018-04-24 12:49:16.972953 7f29700c9700 10 cephx: verify_authorizer ok nonce 6b8b4567327b23c6 reply_bl.length()=36
2018-04-24 12:49:16.972954 7f29700c9700 10 cephx server client.openstack:  ticket_req.keys = 5
2018-04-24 12:49:16.972956 7f29700c9700 10 cephx server client.openstack:  adding key for service mon
2018-04-24 12:49:16.972976 7f29700c9700 10 cephx server client.openstack:  adding key for service osd
2018-04-24 12:49:16.972989 7f29700c9700 10 cephx keyserverdata: get_caps: name=client.openstack
2018-04-24 12:49:16.972991 7f29700c9700 10 cephx keyserverdata: get_secret: num of caps=2
2018-04-24 12:49:16.972994 7f29700c9700 10 cephx: build_service_ticket_reply encoding 2 tickets with secret AQDMJ99aOcrzORAAW1HLRP2RYyKTQFUWK8t0yg==
2018-04-24 12:49:16.973004 7f29700c9700 10 cephx: build_service_ticket service mon secret_id 98 ticket_info.ticket.name=client.openstack
2018-04-24 12:49:16.973014 7f29700c9700 10 cephx: build_service_ticket service osd secret_id 98 ticket_info.ticket.name=client.openstack

Comment 13 John Fulton 2018-04-24 20:56:47 UTC
Root cause:

The fix for CVE-2017-12155 contained patches for puppet-tripleo and tripleo-heat-templates (THT). 

 https://review.openstack.org/#/c/528599/ (puppet-tripleo)
 https://review.openstack.org/#/c/537874/ (puppet-tripleo)
 https://review.openstack.org/#/c/528595/ (tripleo-heat-templates)

The problem is that only the THT patches are in this puddle [1] and both puppet-tripleo patches are missing [2]. 

So tripleo-heat-templates were setting the key to 640 [1] but puppet-tripleo was not adding the ACLs that glance, cinder, or nova will need to be able to read the key. 

This seems to be a packaging issue. 


[1] 

[stack@undercloud-0 ~]$ grep -B 2 640 /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-base.yaml
              client.CLIENT_USER: {
                secret: 'CLIENT_KEY',
                mode: '0640',
[stack@undercloud-0 ~]$

[2] 

[root@controller-0 base]# grep -i acl nova/libvirt.pp 
[root@controller-0 base]# grep -i acl glance/api.pp 
[root@controller-0 base]# pwd
/etc/puppet/modules/tripleo/manifests/profile/base
[root@controller-0 base]# rpm -q puppet-tripleo
puppet-tripleo-5.6.4-3.el7ost.noarch
[root@controller-0 base]#

Comment 16 Steve Linabery 2018-04-25 03:20:49 UTC
Marking this CLOSED NOTABUG because phase1 is passing now that we are using overcloud images that contain the right content.