Bug 1570007 - glance has issue with communicating with ceph after fresh install
Summary: glance has issue with communicating with ceph after fresh install
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: zstream
: ---
Assignee: RHOS Maint
QA Contact: nlevinki
URL:
Whiteboard:
: 1570026 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-20 12:37 UTC by Attila Fazekas
Modified: 2018-08-03 18:30 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-25 03:20:49 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Attila Fazekas 2018-04-20 12:37:00 UTC
Description of problem:

We are unable upload images:
var/log/glance/api.log:
2018-04-19 19:31:32.865 139939 DEBUG glance_store.location [req-dfb1bf42-2abe-45b3-9c1f-e4279668119c e52a82a36b8243778df0257afda656bf acd27d64a1a541f1be0fb8619ef2ce52 - default default] Registering scheme rbd with {'location_class': <class 'glance_store._drivers.rbd.StoreLocation'>, 'store': <glance_store._drivers.rbd.Store object at 0x7f5cd587e250>, 'store_entry': 'glance.store.rbd.Store'} register_scheme_map /usr/lib/python2.7/site-packages/glance_store/location.py:88
2018-04-19 19:31:32.899 139939 ERROR glance_store._drivers.rbd [req-dfb1bf42-2abe-45b3-9c1f-e4279668119c e52a82a36b8243778df0257afda656bf acd27d64a1a541f1be0fb8619ef2ce52 - default default] Error connecting to ceph cluster.
2018-04-19 19:31:32.899 139939 ERROR glance_store._drivers.rbd Traceback (most recent call last):
2018-04-19 19:31:32.899 139939 ERROR glance_store._drivers.rbd   File "/usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.py", line 273, in get_connection
2018-04-19 19:31:32.899 139939 ERROR glance_store._drivers.rbd     client.connect(timeout=self.connect_timeout)
2018-04-19 19:31:32.899 139939 ERROR glance_store._drivers.rbd   File "rados.pyx", line 785, in rados.Rados.connect (rados.c:8969)
2018-04-19 19:31:32.899 139939 ERROR glance_store._drivers.rbd ObjectNotFound: error connecting to the cluster
2018-04-19 19:31:32.899 139939 ERROR glance_store._drivers.rbd 
2018-04-19 19:31:32.901 139939 DEBUG oslo_messaging._drivers.amqpdriver [req-dfb1bf42-2abe-45b3-9c1f-e4279668119c e52a82a36b8243778df0257afda656bf acd27d64a1a541f1be0fb8619ef2ce52 - default default] CAST unique_id: fbddddb9c31e417aadbc8b531be102e2 NOTIFY exchange 'glance' topic 'notifications.error' _send /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:552
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data [req-dfb1bf42-2abe-45b3-9c1f-e4279668119c e52a82a36b8243778df0257afda656bf acd27d64a1a541f1be0fb8619ef2ce52 - default default] Failed to upload image data due to internal error
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data Traceback (most recent call last):
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance/api/v2/image_data.py", line 115, in upload
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     image.set_data(data, size)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance/domain/proxy.py", line 195, in set_data
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     self.base.set_data(data, size)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance/notifier.py", line 479, in set_data
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     _send_notification(notify_error, 'image.upload', msg)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     self.force_reraise()
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     six.reraise(self.type_, self.value, self.tb)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance/notifier.py", line 426, in set_data
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     self.repo.set_data(data, size)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance/api/policy.py", line 187, in set_data
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     return self.image.set_data(*args, **kwargs)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance/quota/__init__.py", line 304, in set_data
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     self.image.set_data(data, size=size)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance/location.py", line 440, in set_data
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     verifier=verifier)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance_store/backend.py", line 452, in add_to_backend
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     verifier)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance_store/backend.py", line 425, in store_add_to_backend
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     verifier=verifier)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance_store/capabilities.py", line 225, in op_checker
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     return store_op_fun(store, *args, **kwargs)
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.py", line 447, in add
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     rados_id=self.user) as conn:
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     return self.gen.next()
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data   File "/usr/lib/python2.7/site-packages/glance_store/_drivers/rbd.py", line 277, in get_connection
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data     raise exceptions.BackendException()
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data BackendException
2018-04-19 19:31:32.906 139939 ERROR glance.api.v2.image_data 


However the ceph healt looks ok.

Version-Release number of selected component (if applicable):
ceph-base.x86_64                    2:10.2.10-17.el7cp 
puppet-ceph.noarch                  2.4.1-2.el7ost
python-rados.x86_64                 2:10.2.10-17.el7cp
puppet-tripleo.noarch               5.6.4-3.el7ost
python-glance.noarch                1:13.0.0-2.el7ost

How reproducible:
always

Comment 2 Filip Hubík 2018-04-20 13:12:34 UTC
*** Bug 1570026 has been marked as a duplicate of this bug. ***

Comment 4 John Fulton 2018-04-20 19:01:41 UTC
The openstack.keyring created by puppet-ceph for use by glance doesn't seem to have adequate privileges.

[root@controller-0 ~]# rbd -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring -p images ls
[root@controller-0 ~]# rbd -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.openstack.keyring -p images ls
rbd: couldn't connect to the cluster!
2018-04-20 18:56:59.486026 7f27c4687d80  0 librados: client.admin authentication error (22) Invalid argument
[root@controller-0 ~]#

Comment 7 John Fulton 2018-04-20 20:04:42 UTC
ceph tell mon.0 injectargs --debug-auth 20/20

rbd -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.openstack.keyring -p images ls
==> ceph-mon.controller-0.log <==
2018-04-20 19:56:58.987116 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-20 19:56:59.910703 7f296d2b8700 10 In get_auth_session_handler for protocol 0
2018-04-20 19:56:59.911271 7f29700c9700 10 cephx server client.admin: start_session server_challenge 63e5a7ace587694d
2018-04-20 19:56:59.911568 7f29700c9700 10 cephx server client.admin: handle_request get_auth_session_key for client.admin
2018-04-20 19:56:59.911613 7f29700c9700  0 mon.controller-0@0(leader).auth v13 caught error when trying to handle auth request, probably malformed request
2018-04-20 19:57:03.987244 7f29708ca700 10 cephx keyserver: _check_rotating_secrets

-----------------

rbd -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring -p images ls
==> ceph-mon.controller-0.log <==
2018-04-20 20:00:23.755990 7f296d2b8700 10 In get_auth_session_handler for protocol 0
2018-04-20 20:00:23.756829 7f29700c9700 10 cephx server client.admin: start_session server_challenge d7ebe7ac2388f3e0
2018-04-20 20:00:23.758262 7f29700c9700 10 cephx server client.admin: handle_request get_auth_session_key for client.admin
2018-04-20 20:00:23.758321 7f29700c9700 20 cephx server client.admin:  checking key: req.key=3cf507a4ff9086f6 expected_key=3cf507a4ff9086f6
2018-04-20 20:00:23.758360 7f29700c9700 10 cephx: build_service_ticket_reply encoding 1 tickets with secret AQBtydlaAAAAABAA0ddLdyQL84ZrfOcOqSfUCA==
2018-04-20 20:00:23.758375 7f29700c9700 10 cephx: build_service_ticket service auth secret_id 2 ticket_info.ticket.name=client.admin
2018-04-20 20:00:23.758386 7f29700c9700 10 cephx keyserverdata: get_caps: name=client.admin
2018-04-20 20:00:23.758387 7f29700c9700 10 cephx keyserverdata: get_secret: num of caps=3
2018-04-20 20:00:23.758801 7f29700c9700 10 cephx server client.admin: handle_request get_principal_session_key
2018-04-20 20:00:23.758803 7f29700c9700 10 cephx: verify_authorizer decrypted service auth secret_id=2
2018-04-20 20:00:23.758827 7f29700c9700 10 cephx: verify_authorizer global_id=4225
2018-04-20 20:00:23.758836 7f29700c9700 10 cephx: verify_authorizer ok nonce 6b8b4567327b23c6 reply_bl.length()=36
2018-04-20 20:00:23.758838 7f29700c9700 10 cephx server client.admin:  ticket_req.keys = 5
2018-04-20 20:00:23.758840 7f29700c9700 10 cephx server client.admin:  adding key for service mon
2018-04-20 20:00:23.758862 7f29700c9700 10 cephx server client.admin:  adding key for service osd
2018-04-20 20:00:23.758877 7f29700c9700 10 cephx keyserverdata: get_caps: name=client.admin
2018-04-20 20:00:23.758878 7f29700c9700 10 cephx keyserverdata: get_secret: num of caps=3
2018-04-20 20:00:23.758882 7f29700c9700 10 cephx: build_service_ticket_reply encoding 2 tickets with secret AQDXRtpaw1YzLRAA0/+G4CM2Ng8qqtdQ08pHMA==
2018-04-20 20:00:23.758892 7f29700c9700 10 cephx: build_service_ticket service mon secret_id 9 ticket_info.ticket.name=client.admin
2018-04-20 20:00:23.758903 7f29700c9700 10 cephx: build_service_ticket service osd secret_id 9 ticket_info.ticket.name=client.admin
2018-04-20 20:00:23.993846 7f29708ca700 10 cephx keyserver: _check_rotating_secrets

Comment 9 John Fulton 2018-04-24 12:49:40 UTC
I was wrong in comment #7. My request was malformed. Here is the same key that was created working:

[root@controller-0 ceph]# rbd -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.openstack.keyring -p images ls --id openstack
foo
[root@controller-0 ceph]# 

[root@controller-0 ceph]# rbd -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.openstack.keyring -p images ls -n client.openstack
foo
[root@controller-0 ceph]#

==> ceph-mon.controller-0.log <==
2018-04-24 12:48:26.406232 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:48:31.406369 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:48:36.406495 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:48:39.168739 7f29708ca700  0 mon.controller-0@0(leader).data_health(3) update_stats avail 76% total 40947 MB, used 9793 MB, avail 31153 MB
2018-04-24 12:48:41.406701 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:48:46.406859 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:48:51.407040 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:48:56.407211 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:49:01.407375 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:49:06.407520 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:49:11.407698 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:49:16.407820 7f29708ca700 10 cephx keyserver: _check_rotating_secrets
2018-04-24 12:49:16.971214 7f296d2b8700 10 In get_auth_session_handler for protocol 0
2018-04-24 12:49:16.971887 7f29700c9700 10 cephx server client.openstack: start_session server_challenge 9562e3f8b3dfdb21
2018-04-24 12:49:16.972241 7f29700c9700 10 cephx server client.openstack: handle_request get_auth_session_key for client.openstack
2018-04-24 12:49:16.972264 7f29700c9700 20 cephx server client.openstack:  checking key: req.key=a2bb462b5222e920 expected_key=a2bb462b5222e920
2018-04-24 12:49:16.972325 7f29700c9700 10 cephx: build_service_ticket_reply encoding 1 tickets with secret AQBtydlaAAAAABAAQ6qzhegXg+GXw7WAnaFFDA==
2018-04-24 12:49:16.972346 7f29700c9700 10 cephx: build_service_ticket service auth secret_id 9 ticket_info.ticket.name=client.openstack
2018-04-24 12:49:16.972355 7f29700c9700 10 cephx keyserverdata: get_caps: name=client.openstack
2018-04-24 12:49:16.972356 7f29700c9700 10 cephx keyserverdata: get_secret: num of caps=2
2018-04-24 12:49:16.972919 7f29700c9700 10 cephx server client.openstack: handle_request get_principal_session_key
2018-04-24 12:49:16.972921 7f29700c9700 10 cephx: verify_authorizer decrypted service auth secret_id=9
2018-04-24 12:49:16.972945 7f29700c9700 10 cephx: verify_authorizer global_id=4234
2018-04-24 12:49:16.972953 7f29700c9700 10 cephx: verify_authorizer ok nonce 6b8b4567327b23c6 reply_bl.length()=36
2018-04-24 12:49:16.972954 7f29700c9700 10 cephx server client.openstack:  ticket_req.keys = 5
2018-04-24 12:49:16.972956 7f29700c9700 10 cephx server client.openstack:  adding key for service mon
2018-04-24 12:49:16.972976 7f29700c9700 10 cephx server client.openstack:  adding key for service osd
2018-04-24 12:49:16.972989 7f29700c9700 10 cephx keyserverdata: get_caps: name=client.openstack
2018-04-24 12:49:16.972991 7f29700c9700 10 cephx keyserverdata: get_secret: num of caps=2
2018-04-24 12:49:16.972994 7f29700c9700 10 cephx: build_service_ticket_reply encoding 2 tickets with secret AQDMJ99aOcrzORAAW1HLRP2RYyKTQFUWK8t0yg==
2018-04-24 12:49:16.973004 7f29700c9700 10 cephx: build_service_ticket service mon secret_id 98 ticket_info.ticket.name=client.openstack
2018-04-24 12:49:16.973014 7f29700c9700 10 cephx: build_service_ticket service osd secret_id 98 ticket_info.ticket.name=client.openstack

Comment 13 John Fulton 2018-04-24 20:56:47 UTC
Root cause:

The fix for CVE-2017-12155 contained patches for puppet-tripleo and tripleo-heat-templates (THT). 

 https://review.openstack.org/#/c/528599/ (puppet-tripleo)
 https://review.openstack.org/#/c/537874/ (puppet-tripleo)
 https://review.openstack.org/#/c/528595/ (tripleo-heat-templates)

The problem is that only the THT patches are in this puddle [1] and both puppet-tripleo patches are missing [2]. 

So tripleo-heat-templates were setting the key to 640 [1] but puppet-tripleo was not adding the ACLs that glance, cinder, or nova will need to be able to read the key. 

This seems to be a packaging issue. 


[1] 

[stack@undercloud-0 ~]$ grep -B 2 640 /usr/share/openstack-tripleo-heat-templates/puppet/services/ceph-base.yaml
              client.CLIENT_USER: {
                secret: 'CLIENT_KEY',
                mode: '0640',
[stack@undercloud-0 ~]$

[2] 

[root@controller-0 base]# grep -i acl nova/libvirt.pp 
[root@controller-0 base]# grep -i acl glance/api.pp 
[root@controller-0 base]# pwd
/etc/puppet/modules/tripleo/manifests/profile/base
[root@controller-0 base]# rpm -q puppet-tripleo
puppet-tripleo-5.6.4-3.el7ost.noarch
[root@controller-0 base]#

Comment 16 Steve Linabery 2018-04-25 03:20:49 UTC
Marking this CLOSED NOTABUG because phase1 is passing now that we are using overcloud images that contain the right content.


Note You need to log in before you can comment on or make changes to this bug.