Bug 1580100

Summary: [UPGRADE] OSP 12 -> 13 overcloud with non-collocated journal disks failed with error: Error EINVAL: bad entity name
Product: Red Hat OpenStack Reporter: Yogev Rabl <yrabl>
Component: ceph-ansibleAssignee: Sébastien Han <shan>
Status: CLOSED DUPLICATE QA Contact: Yogev Rabl <yrabl>
Severity: high Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: gfidente, yrabl
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-21 17:00:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yogev Rabl 2018-05-20 03:38:24 UTC
Description of problem:
A hyper converged Overcloud upgrade failed while running the command

$ openstack overcloud upgrade converge \
--templates \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/virt/network/network-environment.yaml \
-e /home/stack/virt/hostnames.yml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/cinder-backup.yaml \
-e /home/stack/virt/ceph-min-osds.yaml \
-e /home/stack/virt/ceph-single-host-mode.yaml \
-e /home/stack/virt/docker-images.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/lifecycle/upgrade-converge.yaml \
-r /usr/share/openstack-tripleo-heat-templates/roles_data.yaml

The upgrade failed during the run of Ceph-Ansible, with the errors (from /var/log/mistral/ceph-install-workflow.log):

2018-05-19 21:17:52,305 p=12968 u=mistral |  failed: [192.168.24.16] (item=192.168.24.19) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-0", "ceph", "--cluster", "ceph",
 "auth", "get-or-create", "mgr.controller-1", "mon", "allow profile mgr", "osd", "allow *", "mds", "allow *", "-o", "/etc/ceph/ceph.mgr.controller-1.keyring"], "delta": "0:00:00.444762", "end
": "2018-05-20 01:17:50.280241", "item": "192.168.24.19", "msg": "non-zero return code", "rc": 22, "start": "2018-05-20 01:17:49.835479", "stderr": "Error EINVAL: bad entity name", "stderr_li
nes": ["Error EINVAL: bad entity name"], "stdout": "", "stdout_lines": []}
2018-05-19 21:17:53,094 p=12968 u=mistral |  failed: [192.168.24.16] (item=192.168.24.18) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-0", "ceph", "--cluster", "ceph",
 "auth", "get-or-create", "mgr.controller-2", "mon", "allow profile mgr", "osd", "allow *", "mds", "allow *", "-o", "/etc/ceph/ceph.mgr.controller-2.keyring"], "delta": "0:00:00.430806", "end
": "2018-05-20 01:17:51.074694", "item": "192.168.24.18", "msg": "non-zero return code", "rc": 22, "start": "2018-05-20 01:17:50.643888", "stderr": "Error EINVAL: bad entity name", "stderr_li
nes": ["Error EINVAL: bad entity name"], "stdout": "", "stdout_lines": []}
2018-05-19 21:17:53,824 p=12968 u=mistral |  failed: [192.168.24.16] (item=192.168.24.16) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller-0", "ceph", "--cluster", "ceph", "auth", "get-or-create", "mgr.controller-0", "mon", "allow profile mgr", "osd", "allow *", "mds", "allow *", "-o", "/etc/ceph/ceph.mgr.controller-0.keyring"], "delta": "0:00:00.389069", "end": "2018-05-20 01:17:51.804882", "item": "192.168.24.16", "msg": "non-zero return code", "rc": 22, "start": "2018-05-20 01:17:51.415813", "stderr": "Error EINVAL: bad entity name", "stderr_lines": ["Error EINVAL: bad entity name"], "stdout": "", "stdout_lines": []}


Version-Release number of selected component (if applicable):
ceph-ansible-3.1.0-0.1.rc3.el7cp.noarch
puppet-tripleo-8.3.2-6.el7ost.noarch
ansible-tripleo-ipsec-8.1.1-0.20180308133440.8f5369a.el7ost.noarch
openstack-tripleo-common-8.6.1-12.el7ost.noarch
openstack-tripleo-validations-8.4.1-5.el7ost.noarch
openstack-tripleo-common-containers-8.6.1-12.el7ost.noarch
python-tripleoclient-9.2.1-9.el7ost.noarch
openstack-tripleo-image-elements-8.0.1-1.el7ost.noarch
openstack-tripleo-heat-templates-8.0.2-22.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.0-2.el7ost.noarch


How reproducible:
Unknown

Steps to Reproduce:
1. Deploy an HCI overcloud in latest OSP 12 
2. Run an upgrade process (including updating the roles_data.yaml)

Actual results:
The upgrade failed during the update of Ceph

Expected results:
The upgrade finish successfully

Additional info:

Comment 1 Yogev Rabl 2018-05-20 03:39:26 UTC
My mistake, in comment 1, it says it is an HCI deployment, it is not. It is a monolithic deployment with 3 controller nodes, 2 compute nodes and 1 Ceph storage node with 5 OSDs

Comment 2 Sébastien Han 2018-05-20 10:12:24 UTC
I thought this has been verified here: https://bugzilla.redhat.com/show_bug.cgi?id=1574995

Yogev, what's the difference?
Thanks.

Comment 3 Yogev Rabl 2018-05-21 16:34:30 UTC
(In reply to leseb from comment #2)
> I thought this has been verified here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1574995
> 
> Yogev, what's the difference?
> Thanks.

I thought the same thing, I am reopening that bug, I just wanted to have a record for it in the RH Openstack domain and make depend on that bug.