Description of problem: ----------------------- Upgrade of ceph cluster failed: openstack overcloud external-upgrade run \ --stack qe-Cloud-0 \ --tags ceph 2>&1 ... "skipping: [controller-2] => (item=[u'/var/lib/ceph/bootstrap-mds/ceph.keyring', {'_ansible_parsed': True, u'stat': {u'charset': u'unknown', u'uid': 42430, u'exists': True, u'attr_flags': u'', u'woth': False, u'isreg': True, u'device_type': 0, u'mtime': 1542215850.0, u'block_size': 4096, u'inode': 90283582, u'isgid': False, u'size': 113, u'executable': False, u'roth': True, u'isuid': False, u'readable': True, u'version': None, u'pw_name': u'mistral', u'gid': 42430, u'ischr': False, u'wusr': True, u'writeable': True, u'isdir': False, u'blocks': 8, u'xoth': False, u'rusr': True, u'nlink': 1, u'issock': False, u'rgrp': True, u'gr_name': u'mistral', u'path': u'/var/lib/mistral/f956876e-3d55-4d06-b918-639d66d754d5/ceph-ansible/fetch_dir/69f8f286-e7fc-11e8-8d93-5254009463af//var/lib/ceph/bootstrap-mds/ceph.keyring', u'xusr': False, u'atime': 1542373239.294403, u'mimetype': u'unknown', u'ctime': 1542373239.294403, u'isblk': False, u'checksum': u'242959f387e1be802c678d37089baab27172e8ef', u'dev': 64769, u'wgrp': False, u'isfifo': False, u'mode': u'0644', u'xgrp': False, u'islnk': False, u'attributes': []}, '_ansible_item_result': True, '_ansible_no_log': False, '_ansible_delegated_vars': {'ansible_delegated_host': u'localhost', 'ansible_host': u'localhost'}, u'changed': False, 'failed': False, 'item': u'/var/lib/ceph/bootstrap-mds/ceph.keyring', u'invocation': {u'module_args': {u'checksum_algorithm': u'sha1', u'get_checksum': True, u'follow': False, u'path': u'/var/lib/mistral/f956876e-3d55-4d06-b918-639d66d754d5/ceph-ansible/fetch_dir/69f8f286-e7fc-11e8-8d93-5254009463af//var/lib/ceph/bootstrap-mds/ceph.keyring', u'get_md5': None, u'get_mime': True, u'get_attributes': True}}, 'failed_when_result': False, '_ansible_ignore_errors': None, '_ansible_item_label': u'/var/lib/ceph/bootstrap-mds/ceph.keyring'}]) => {\"changed\": false, \"item\": [\"/var/lib/ceph/bootstrap-mds/ceph.keyring\", {\"_ansible_delegated_vars\": {\"ansible_delegated_host\": \"localhost\", \"ansible_host\": \"localhost\"}, \"_ansible_ignore_errors\": null, \"_ansible_item_label\": \"/var/lib/ceph/bootstrap-mds/ceph.keyring\", \"_ansible_item_result\": true, \"_ansible_no_log\": false, \"_ansible_parsed\": true, \"changed\": false, \"failed\": false, \"failed_when_result\": false, \"invocation\": {\"module_args\": {\"checksum_algorithm\": \"sha1\", \"follow\": false, \"get_attributes\": true, \"get_checksum\": true, \"get_md5\": null, \"get_mime\": true, \"path\": \"/var/lib/mistral/f956876e-3d55-4d06-b918-639d66d754d5/ceph-ansible/fetch_dir/69f8f286-e7fc-11e8-8d93-5254009463af//var/lib/ceph/bootstrap-mds/ceph.keyring\"}}, \"item\": \"/var/lib/ceph/bootstrap-mds/ceph.keyring\", \"stat\": {\"atime\": 1542373239.294403, \"attr_flags\": \"\", \"attributes\": [], \"block_size\": 4096, \"blocks\": 8, \"charset\": \"unknown\", \"checksum\": \"242959f387e1be802c678d37089baab27172e8ef\", \"ctime\": 1542373239.294403, \"dev\": 64769, \"device_type\": 0, \"executable\": false, \"exists\": true, \"gid\": 42430, \"gr_name\": \"mistral\", \"inode\": 90283582, \"isblk\": false, \"ischr\": false, \"isdir\": false, \"isfifo\": false, \"isgid\": false, \"islnk\": false, \"isreg\": true, \"issock\": false, \"isuid\": false, \"mimetype\": \"unknown\", \"mode\": \"0644\", \"mtime\": 1542215850.0, \"nlink\": 1, \"path\": \"/var/lib/mistral/f956876e-3d55-4d06-b918-639d66d754d5/ceph-ansible/fetch_dir/69f8f286-e7fc-11e8-8d93-5254009463af//var/lib/ceph/bootstrap-mds/ceph.keyring\", \"pw_name\": \"mistral\", \"readable\": true, \"rgrp\": true, \"roth\": true, \"rusr\": true, \"size\": 113, \"uid\": 42430, \"version\": null, \"wgrp\": false, \"woth\": false, \"writeable\": true, \"wusr\": true, \"xgrp\": false, \"xoth\": false, \"xusr\": false}}], \"skip_reason\": \"Conditional result was False\"}", "failed: [controller-2] (item=[u'/var/lib/ceph/bootstrap-rbd/ceph.keyring', {'_ansible_parsed': True, u'stat': {u'exists': False}, '_ansible_item_result': True, '_ansible_no_log': False, '_ansible_delegated_vars': {'ansible_delegated_host': u'localhost', 'ansible_host': u'localhost'}, u'changed': False, 'failed': False, 'item': u'/var/lib/ceph/bootstrap-rbd/ceph.keyring', u'invocation': {u'module_args': {u'checksum_algorithm': u'sha1', u'get_checksum': True, u'follow': False, u'path': u'/var/lib/mistral/f956876e-3d55-4d06-b918-639d66d754d5/ceph-ansible/fetch_dir/69f8f286-e7fc-11e8-8d93-5254009463af//var/lib/ceph/bootstrap-rbd/ceph.keyring', u'get_md5': None, u'get_mime': True, u'get_attributes': True}}, 'failed_when_result': False, '_ansible_ignore_errors': None, '_ansible_item_label': u'/var/lib/ceph/bootstrap-rbd/ceph.keyring'}]) => {\"changed\": false, \"item\": [\"/var/lib/ceph/bootstrap-rbd/ceph.keyring\", {\"_ansible_delegated_vars\": {\"ansible_delegated_host\": \"localhost\", \"ansible_host\": \"localhost\"}, \"_ansible_ignore_errors\": null, \"_ansible_item_label\": \"/var/lib/ceph/bootstrap-rbd/ceph.keyring\", \"_ansible_item_result\": true, \"_ansible_no_log\": false, \"_ansible_parsed\": true, \"changed\": false, \"failed\": false, \"failed_when_result\": false, \"invocation\": {\"module_args\": {\"checksum_algorithm\": \"sha1\", \"follow\": false, \"get_attributes\": true, \"get_checksum\": true, \"get_md5\": null, \"get_mime\": true, \"path\": \"/var/lib/mistral/f956876e-3d55-4d06-b918-639d66d754d5/ceph-ansible/fetch_dir/69f8f286-e7fc-11e8-8d93-5254009463af//var/lib/ceph/bootstrap-rbd/ceph.keyring\"}}, \"item\": \"/var/lib/ceph/bootstrap-rbd/ceph.keyring\", \"stat\": {\"exists\": false}}], \"msg\": \"file not found: /var/lib/ceph/bootstrap-rbd/ceph.keyring\"}", "NO MORE HOSTS LEFT *************************************************************", "PLAY RECAP *********************************************************************", "ceph-0 : ok=2 changed=0 unreachable=0 failed=0 ", "ceph-1 : ok=2 changed=0 unreachable=0 failed=0 ", "ceph-2 : ok=2 changed=0 unreachable=0 failed=0 ", "compute-0 : ok=2 changed=0 unreachable=0 failed=0 ", "compute-1 : ok=2 changed=0 unreachable=0 failed=0 ", "controller-0 : ok=2 changed=0 unreachable=0 failed=0 ", "controller-1 : ok=2 changed=0 unreachable=0 failed=0 ", "controller-2 : ok=53 changed=6 unreachable=0 failed=1 ", "localhost : ok=1 changed=0 unreachable=0 failed=0 ", "Friday 16 November 2018 08:02:09 -0500 (0:00:00.659) 0:01:25.761 ******* ", "=============================================================================== " File /var/lib/ceph/bootstrap-rbd/ceph.keyring is missing on all controller nodes: for i in 8 16 15 > do > ssh heat-admin.24.${i} 'sudo ls -l /var/lib/ceph/bootstrap-rbd/' > done Warning: Permanently added '192.168.24.8' (ECDSA) to the list of known hosts. total 0 Warning: Permanently added '192.168.24.16' (ECDSA) to the list of known hosts. total 0 Warning: Permanently added '192.168.24.15' (ECDSA) to the list of known hosts. total 0 Version-Release number of selected component (if applicable): ------------------------------------------------------------- openstack-tripleo-heat-templates-9.0.1-0.20181013060879.el7ost.noarch puppet-ceph-2.5.2-0.20181013125935.f8d3fd2.el7ost.noarch ceph-ansible-3.1.10-1.el7cp.noarch Steps to Reproduce: ------------------- 1. Deploy RHOS-12 and upgrade it to RHOS-13 2. Follow upgrade procedure for RHOS-14 3. Step for ceph cluster upgrade fails Expected results: ----------------- Ceph cluster is successfully upgraded Additional info: ---------------- Virtual environment: 3controllers + 2computes + 3ceph
ROOT CAUSE: fetch directory was missing bootstrap-rbd key WORKAROUND: - extract bootstrap-rbd key from running ceph cluster - download fetch directory tarball and modify it to contain extracted bootstrap-rbd key - upload modified fetch directory tarball - resume upgrade HOW TO IMPLEMENT WORKAROUND: - Run the following on your undercloud and then resume the upgrade. - Set $IP to ip address of the node running Ceph Monitor service (usually a controller node). IP=192.168.24.8 source ~/stackrc mkdir working_dir pushd working_dir STACK=$(openstack stack list -c "Stack Name" -f value) CONTAINER=$(echo $(echo $STACK)_ceph_ansible_fetch_dir) swift download $CONTAINER TARBALL=$(ls *.tar.gz) tar xf $TARBALL ssh heat-admin@$IP "ceph auth get-or-create client.bootstrap-osd " > ceph.keyring UUID=$(cat ceph_cluster_uuid.conf) mkdir $UUID/var/lib/ceph/bootstrap-rbd/ mv ceph.keyring $UUID/var/lib/ceph/bootstrap-rbd/ rm $TARBALL tar cvfz $TARBALL * swift upload $CONTAINER $TARBALL popd FOLLOW UP: A proposed fix will be posted to osp12 which should prevent this situation but I want to verify with the same test once the patch is available. Note: it didn't come up during upgrade from 13 to 14.
Further analysis: The root cause, a missing bootstrap-rbd key [0][1], remains the same but the proposed fix of somehow gathering it in OSP12 won't help this bug. Instead I'm requesting that ceph-ansible not fail because the bootstrap-rbd key is not present when it is not needed. The "push ceph files to the ansible server" task [2] didn't fail during the 12>13 upgrade because Jewel was running during that upgrade and {{ceph_config_keys}} is only set to gather the RBD key when the version is Luminous [3]. During the 13>14 upgrade, the version was Lumionus so the same tasks [3] added the RBD key path to the {{ceph_config_keys}} list but when the "push ceph files to the ansible server" task [2] ran, the key wasn't there; the cluster was originally installed with Jewel. Note that the fetch directory did store other keys [4] but not the RBD mirror key. In case it helps, this cluster wasn't running the RBD mirror service, so it might help to simply not append the RBD mirror key to the {{ceph_config_keys}} if there are no members in the RBD mirror role. [0] http://ix.io/1s6o [1] http://paste.openstack.org/show/734923/ [2] https://github.com/ceph/ceph-ansible/blob/v3.1.10/roles/ceph-mon/tasks/docker/fetch_configs.yml [3] https://github.com/ceph/ceph-ansible/blob/v3.1.10/roles/ceph-mon/tasks/docker/copy_configs.yml#L15 [4] http://paste.openstack.org/show/734926/
This was introduced in https://github.com/ceph/ceph-ansible/releases/tag/v3.2.0beta6 via https://github.com/ceph/ceph-ansible/commit/40b7747af7b3d139b3017b53f78ab52fd1082a92.
(In reply to leseb from comment #9) > This was introduced in > https://github.com/ceph/ceph-ansible/releases/tag/v3.2.0beta6 via > https://github.com/ceph/ceph-ansible/commit/ > 40b7747af7b3d139b3017b53f78ab52fd1082a92. This happened in ceph-ansible-3.1.10-1.el7cp but I don't think the patch [1] was backported to 3.1 [1] https://github.com/ceph/ceph-ansible/commit/40b7747af7b3d139b3017b53f78ab52fd1082a92
Fix present in https://github.com/ceph/ceph-ansible/releases/tag/v3.1.11
In https://github.com/ceph/ceph-ansible/releases/tag/v3.2.0rc5
Created attachment 1509799 [details] ceph_ansible_command.log
Sorry guys, https://github.com/ceph/ceph-ansible/releases/tag/v3.1.12
Yes, today.
verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0020