Bug 1569290 - Ceph upgrade fails during OSP10 to OSP13 fast forward upgrade while running TASK [ceph-client : chmod cephx key(s)]
Summary: Ceph upgrade fails during OSP10 to OSP13 fast forward upgrade while running ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible
Version: 3.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 3.1
Assignee: Sébastien Han
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks: 1548353 1571947
TreeView+ depends on / blocked
 
Reported: 2018-04-19 00:27 UTC by Marius Cornea
Modified: 2019-01-09 08:52 UTC (History)
13 users (show)

Fixed In Version: RHEL: ceph-ansible-3.1.0-0.1.beta8.el7cp
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-09 08:52:17 UTC
Target Upstream Version:


Attachments (Terms of Use)
ceph-install-workflow.log (5.06 MB, text/plain)
2018-04-19 00:27 UTC, Marius Cornea
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 2540 0 None None None 2018-04-20 14:39:09 UTC

Description Marius Cornea 2018-04-19 00:27:30 UTC
Created attachment 1423819 [details]
ceph-install-workflow.log

Description of problem:
Ceph upgrade fails during OSP10 to OSP13 fast forward upgrade while running  TASK [ceph-client : chmod cephx key(s)]

Snippet from /var/log/mistral/ceph-install-workflow.log:

2018-04-18 20:01:41,306 p=26738 u=mistral |  TASK [ceph-client : chmod cephx key(s)] ****************************************
2018-04-18 20:01:41,307 p=26738 u=mistral |  task path: /usr/share/ceph-ansible/roles/ceph-client/tasks/create_users_keys.yml:99
2018-04-18 20:01:41,307 p=26738 u=mistral |  Wednesday 18 April 2018  20:01:41 -0400 (0:00:00.072)       0:11:19.185 ******* 
2018-04-18 20:01:41,573 p=26738 u=mistral |  changed: [192.168.24.8] => (item={'caps': {'mds': u"''", 'osd': u"'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx poo
l=images, allow rwx pool=metrics'", 'mon': u"'allow r'", 'mgr': u"'allow *'"}, 'mode': u'0600', 'key': u'AQDroddaAAAAABAA64fr7tW0dXZue2Pl9Wi8Qg==', 'name': u'client.openstack'}) => {"changed": true, "gid": 0, "group": "root", "item": {"ca
ps": {"mds": "''", "mgr": "'allow *'", "mon": "'allow r'", "osd": "'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics'"}, "key": 
"AQDroddaAAAAABAA64fr7tW0dXZue2Pl9Wi8Qg==", "mode": "0600", "name": "client.openstack"}, "mode": "0600", "owner": "root", "path": "/etc/ceph/ceph.client.openstack.keyring", "secontext": "system_u:object_r:container_file_t:s0", "size": 262
, "state": "file", "uid": 0}
2018-04-18 20:01:41,818 p=26738 u=mistral |  failed: [192.168.24.8] (item={'caps': {'mds': u"'allow *'", 'osd': u"'allow rw'", 'mon': u'\'allow r, allow command \\\\\\"auth del\\\\\\", allow command \\\\\\"auth caps\\\\\\", allow command 
\\\\\\"auth get\\\\\\", allow command \\\\\\"auth get-or-create\\\\\\"\'', 'mgr': u"'allow *'"}, 'name': u'client.manila', 'key': u'AQC+t9daAAAAABAA0zzoLz1NwZbBxrHNk14I4g==', 'mode': u'0600'}) => {"changed": false, "item": {"caps": {"mds"
: "'allow *'", "mgr": "'allow *'", "mon": "'allow r, allow command \\\\\\\"auth del\\\\\\\", allow command \\\\\\\"auth caps\\\\\\\", allow command \\\\\\\"auth get\\\\\\\", allow command \\\\\\\"auth get-or-create\\\\\\\"'", "osd": "'all
ow rw'"}, "key": "AQC+t9daAAAAABAA0zzoLz1NwZbBxrHNk14I4g==", "mode": "0600", "name": "client.manila"}, "msg": "file (/etc/ceph/ceph.client.manila.keyring) is absent, cannot continue", "path": "/etc/ceph/ceph.client.manila.keyring", "state
": "absent"}
2018-04-18 20:01:42,066 p=26738 u=mistral |  failed: [192.168.24.8] (item={'caps': {'mds': u"''", 'osd': u"'allow rwx'", 'mon': u"'allow rw'", 'mgr': u"'allow *'"}, 'mode': u'0600', 'key': u'AQDroddaAAAAABAAtrsj06ioGk1GRO2T4XUgOw==', 'nam
e': u'client.radosgw'}) => {"changed": false, "item": {"caps": {"mds": "''", "mgr": "'allow *'", "mon": "'allow rw'", "osd": "'allow rwx'"}, "key": "AQDroddaAAAAABAAtrsj06ioGk1GRO2T4XUgOw==", "mode": "0600", "name": "client.radosgw"}, "ms
g": "file (/etc/ceph/ceph.client.radosgw.keyring) is absent, cannot continue", "path": "/etc/ceph/ceph.client.radosgw.keyring", "state": "absent"}

/etc/ceph/ceph.client.radosgw.keyring and /etc/ceph/ceph.client.manila.keyring do not exist on the compute node(192.168.24.8)as these services were not enabled for the initial OSP10 deployment

Version-Release number of selected component (if applicable):
ceph-ansible-3.1.0-0.1.beta6.el7cp.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy OSP10 with 3 controllers + 2 computes + 3 ceph osd nodes
2. Upgrade to OSP13 via the fast forward procedure

Actual results:
Ceph upgrade stage fails while running ceph-ansible.

Expected results:
Ceph upgrade succeeds fine.

Additional info:
Attaching /var/log/mistral/ceph-install-workflow.log and sosreports.

Comment 4 Giulio Fidente 2018-04-19 12:35:22 UTC
From OSP12 Director is always passing the full list of keyrings to be created while previously it could selectively create keyrings based on which services were enabled.

In the long term we want Director to behave as it used to in OSP10, but after moving to ceph-ansible this became a complicated issue to solve in the Heat templates.

If we could make ceph-ansible continue/create the missing keyrings that'd give us more time to work on a proper fix in Director.

Comment 5 Sébastien Han 2018-04-20 08:51:25 UTC
ceph-ansible is doing its job correctly, it's just director passing the false information. So ceph-ansible behaves as expected.

Giulio, how long is the director fix is going to take?

Comment 6 Giulio Fidente 2018-04-20 09:28:16 UTC
(In reply to leseb from comment #5)
> ceph-ansible is doing its job correctly, it's just director passing the
> false information. So ceph-ansible behaves as expected.

agreed, but it's also behaving differently then it used to
 
> Giulio, how long is the director fix is going to take?

I am not sure this is going to happen anytime soon and it is more a generic limitation in the framework that a bug; to start with I filed an RFE for Director #1569920 but it's targeted for Rocky.

Comment 7 Sébastien Han 2018-04-20 09:35:16 UTC
(In reply to Giulio Fidente from comment #6)
> (In reply to leseb from comment #5)
> > ceph-ansible is doing its job correctly, it's just director passing the
> > false information. So ceph-ansible behaves as expected.
> 
> agreed, but it's also behaving differently then it used to

Can you be more specific?

>  
> > Giulio, how long is the director fix is going to take?
> 
> I am not sure this is going to happen anytime soon and it is more a generic
> limitation in the framework that a bug; to start with I filed an RFE for
> Director #1569920 but it's targeted for Rocky.

Ok so if I understand correctly, ceph-ansible has to somehow workaround this because it's blocking the release?

Comment 8 Giulio Fidente 2018-04-20 09:43:00 UTC
(In reply to leseb from comment #7)
> (In reply to Giulio Fidente from comment #6)
> > (In reply to leseb from comment #5)
> > > ceph-ansible is doing its job correctly, it's just director passing the
> > > false information. So ceph-ansible behaves as expected.
> > 
> > agreed, but it's also behaving differently then it used to
> 
> Can you be more specific?

previously (beta5) it wouldn't fail on the same task, even though we passed the same parameters we are passing now
 
> > > Giulio, how long is the director fix is going to take?
> > 
> > I am not sure this is going to happen anytime soon and it is more a generic
> > limitation in the framework that a bug; to start with I filed an RFE for
> > Director #1569920 but it's targeted for Rocky.
> 
> Ok so if I understand correctly, ceph-ansible has to somehow workaround this
> because it's blocking the release?

yes, and it is not great, I agree; I do want to address this in Director but I don't think it can happen quickly enough so we'll have to carry some technical debt until it is resolved

Comment 9 Sébastien Han 2018-04-23 17:09:19 UTC
Will be in beta8

Comment 12 Yogev Rabl 2018-05-18 18:58:20 UTC
Verified


Note You need to log in before you can comment on or make changes to this bug.