Bug 1569290
Summary: | Ceph upgrade fails during OSP10 to OSP13 fast forward upgrade while running TASK [ceph-client : chmod cephx key(s)] | ||||||
---|---|---|---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Marius Cornea <mcornea> | ||||
Component: | Ceph-Ansible | Assignee: | Sébastien Han <shan> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Yogev Rabl <yrabl> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.0 | CC: | adeza, aschoen, ceph-eng-bugs, gfidente, gmeno, johfulto, kdreyer, nthomas, pgrist, sankarshan, sasha, yprokule, yrabl | ||||
Target Milestone: | rc | ||||||
Target Release: | 3.1 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | RHEL: ceph-ansible-3.1.0-0.1.beta8.el7cp | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-01-09 08:52:17 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1548353, 1571947 | ||||||
Attachments: |
|
From OSP12 Director is always passing the full list of keyrings to be created while previously it could selectively create keyrings based on which services were enabled. In the long term we want Director to behave as it used to in OSP10, but after moving to ceph-ansible this became a complicated issue to solve in the Heat templates. If we could make ceph-ansible continue/create the missing keyrings that'd give us more time to work on a proper fix in Director. ceph-ansible is doing its job correctly, it's just director passing the false information. So ceph-ansible behaves as expected. Giulio, how long is the director fix is going to take? (In reply to leseb from comment #5) > ceph-ansible is doing its job correctly, it's just director passing the > false information. So ceph-ansible behaves as expected. agreed, but it's also behaving differently then it used to > Giulio, how long is the director fix is going to take? I am not sure this is going to happen anytime soon and it is more a generic limitation in the framework that a bug; to start with I filed an RFE for Director #1569920 but it's targeted for Rocky. (In reply to Giulio Fidente from comment #6) > (In reply to leseb from comment #5) > > ceph-ansible is doing its job correctly, it's just director passing the > > false information. So ceph-ansible behaves as expected. > > agreed, but it's also behaving differently then it used to Can you be more specific? > > > Giulio, how long is the director fix is going to take? > > I am not sure this is going to happen anytime soon and it is more a generic > limitation in the framework that a bug; to start with I filed an RFE for > Director #1569920 but it's targeted for Rocky. Ok so if I understand correctly, ceph-ansible has to somehow workaround this because it's blocking the release? (In reply to leseb from comment #7) > (In reply to Giulio Fidente from comment #6) > > (In reply to leseb from comment #5) > > > ceph-ansible is doing its job correctly, it's just director passing the > > > false information. So ceph-ansible behaves as expected. > > > > agreed, but it's also behaving differently then it used to > > Can you be more specific? previously (beta5) it wouldn't fail on the same task, even though we passed the same parameters we are passing now > > > Giulio, how long is the director fix is going to take? > > > > I am not sure this is going to happen anytime soon and it is more a generic > > limitation in the framework that a bug; to start with I filed an RFE for > > Director #1569920 but it's targeted for Rocky. > > Ok so if I understand correctly, ceph-ansible has to somehow workaround this > because it's blocking the release? yes, and it is not great, I agree; I do want to address this in Director but I don't think it can happen quickly enough so we'll have to carry some technical debt until it is resolved Will be in beta8 Verified |
Created attachment 1423819 [details] ceph-install-workflow.log Description of problem: Ceph upgrade fails during OSP10 to OSP13 fast forward upgrade while running TASK [ceph-client : chmod cephx key(s)] Snippet from /var/log/mistral/ceph-install-workflow.log: 2018-04-18 20:01:41,306 p=26738 u=mistral | TASK [ceph-client : chmod cephx key(s)] **************************************** 2018-04-18 20:01:41,307 p=26738 u=mistral | task path: /usr/share/ceph-ansible/roles/ceph-client/tasks/create_users_keys.yml:99 2018-04-18 20:01:41,307 p=26738 u=mistral | Wednesday 18 April 2018 20:01:41 -0400 (0:00:00.072) 0:11:19.185 ******* 2018-04-18 20:01:41,573 p=26738 u=mistral | changed: [192.168.24.8] => (item={'caps': {'mds': u"''", 'osd': u"'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx poo l=images, allow rwx pool=metrics'", 'mon': u"'allow r'", 'mgr': u"'allow *'"}, 'mode': u'0600', 'key': u'AQDroddaAAAAABAA64fr7tW0dXZue2Pl9Wi8Qg==', 'name': u'client.openstack'}) => {"changed": true, "gid": 0, "group": "root", "item": {"ca ps": {"mds": "''", "mgr": "'allow *'", "mon": "'allow r'", "osd": "'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=backups, allow rwx pool=vms, allow rwx pool=images, allow rwx pool=metrics'"}, "key": "AQDroddaAAAAABAA64fr7tW0dXZue2Pl9Wi8Qg==", "mode": "0600", "name": "client.openstack"}, "mode": "0600", "owner": "root", "path": "/etc/ceph/ceph.client.openstack.keyring", "secontext": "system_u:object_r:container_file_t:s0", "size": 262 , "state": "file", "uid": 0} 2018-04-18 20:01:41,818 p=26738 u=mistral | failed: [192.168.24.8] (item={'caps': {'mds': u"'allow *'", 'osd': u"'allow rw'", 'mon': u'\'allow r, allow command \\\\\\"auth del\\\\\\", allow command \\\\\\"auth caps\\\\\\", allow command \\\\\\"auth get\\\\\\", allow command \\\\\\"auth get-or-create\\\\\\"\'', 'mgr': u"'allow *'"}, 'name': u'client.manila', 'key': u'AQC+t9daAAAAABAA0zzoLz1NwZbBxrHNk14I4g==', 'mode': u'0600'}) => {"changed": false, "item": {"caps": {"mds" : "'allow *'", "mgr": "'allow *'", "mon": "'allow r, allow command \\\\\\\"auth del\\\\\\\", allow command \\\\\\\"auth caps\\\\\\\", allow command \\\\\\\"auth get\\\\\\\", allow command \\\\\\\"auth get-or-create\\\\\\\"'", "osd": "'all ow rw'"}, "key": "AQC+t9daAAAAABAA0zzoLz1NwZbBxrHNk14I4g==", "mode": "0600", "name": "client.manila"}, "msg": "file (/etc/ceph/ceph.client.manila.keyring) is absent, cannot continue", "path": "/etc/ceph/ceph.client.manila.keyring", "state ": "absent"} 2018-04-18 20:01:42,066 p=26738 u=mistral | failed: [192.168.24.8] (item={'caps': {'mds': u"''", 'osd': u"'allow rwx'", 'mon': u"'allow rw'", 'mgr': u"'allow *'"}, 'mode': u'0600', 'key': u'AQDroddaAAAAABAAtrsj06ioGk1GRO2T4XUgOw==', 'nam e': u'client.radosgw'}) => {"changed": false, "item": {"caps": {"mds": "''", "mgr": "'allow *'", "mon": "'allow rw'", "osd": "'allow rwx'"}, "key": "AQDroddaAAAAABAAtrsj06ioGk1GRO2T4XUgOw==", "mode": "0600", "name": "client.radosgw"}, "ms g": "file (/etc/ceph/ceph.client.radosgw.keyring) is absent, cannot continue", "path": "/etc/ceph/ceph.client.radosgw.keyring", "state": "absent"} /etc/ceph/ceph.client.radosgw.keyring and /etc/ceph/ceph.client.manila.keyring do not exist on the compute node(192.168.24.8)as these services were not enabled for the initial OSP10 deployment Version-Release number of selected component (if applicable): ceph-ansible-3.1.0-0.1.beta6.el7cp.noarch How reproducible: 100% Steps to Reproduce: 1. Deploy OSP10 with 3 controllers + 2 computes + 3 ceph osd nodes 2. Upgrade to OSP13 via the fast forward procedure Actual results: Ceph upgrade stage fails while running ceph-ansible. Expected results: Ceph upgrade succeeds fine. Additional info: Attaching /var/log/mistral/ceph-install-workflow.log and sosreports.