Bug 1650572 - [UPGRADES][12-13-14] Failed to upgrade ceph: file not found: /var/lib/ceph/bootstrap-rbd/ceph.keyring
Summary: [UPGRADES][12-13-14] Failed to upgrade ceph: file not found: /var/lib/ceph/bo...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Ansible
Version: 3.1
Hardware: Unspecified
OS: Unspecified
medium
urgent
Target Milestone: rc
: 3.2
Assignee: Sébastien Han
QA Contact: Yogev Rabl
URL:
Whiteboard:
Depends On:
Blocks: 1553640 1578730
TreeView+ depends on / blocked
 
Reported: 2018-11-16 14:05 UTC by Yurii Prokulevych
Modified: 2024-06-13 21:59 UTC (History)
19 users (show)

Fixed In Version: RHEL: ceph-ansible-3.2.0-0.1.rc7.el7cp Ubuntu: ceph-ansible_3.2.0~rc7-2redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-03 19:02:25 UTC
Embargoed:


Attachments (Terms of Use)
ceph_ansible_command.log (331.00 KB, text/plain)
2018-11-29 12:31 UTC, Giulio Fidente
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph-ansible pull 3352 0 None closed Create RBD keys on upgrade 2020-10-07 09:31:09 UTC
Github ceph ceph-ansible pull 3389 0 None closed rolling_update: do not fail on missing keys 2020-10-07 09:31:08 UTC
Red Hat Product Errata RHBA-2019:0020 0 None None None 2019-01-03 19:02:34 UTC

Description Yurii Prokulevych 2018-11-16 14:05:48 UTC
Description of problem:
-----------------------
Upgrade of ceph cluster failed:

openstack overcloud external-upgrade run \
    --stack qe-Cloud-0 \
    --tags ceph 2>&1
...
        "skipping: [controller-2] => (item=[u'/var/lib/ceph/bootstrap-mds/ceph.keyring', {'_ansible_parsed': True, u'stat': {u'charset': u'unknown', u'uid': 42430, u'exists': True, u'attr_flags': u'', u'woth': False, u'isreg': True, u'device_type': 0, u'mtime': 1542215850.0, u'block_size': 4096, u'inode': 90283582, u'isgid': False, u'size': 113, u'executable': False, u'roth': True, u'isuid': False, u'readable': True, u'version': None, u'pw_name': u'mistral', u'gid': 42430, u'ischr': False, u'wusr': True, u'writeable': True, u'isdir': False, u'blocks': 8, u'xoth': False, u'rusr': True, u'nlink': 1, u'issock': False, u'rgrp': True, u'gr_name': u'mistral', u'path': u'/var/lib/mistral/f956876e-3d55-4d06-b918-639d66d754d5/ceph-ansible/fetch_dir/69f8f286-e7fc-11e8-8d93-5254009463af//var/lib/ceph/bootstrap-mds/ceph.keyring', u'xusr': False, u'atime': 1542373239.294403, u'mimetype': u'unknown', u'ctime': 1542373239.294403, u'isblk': False, u'checksum': u'242959f387e1be802c678d37089baab27172e8ef', u'dev': 64769, u'wgrp': False, u'isfifo': False, u'mode': u'0644', u'xgrp': False, u'islnk': False, u'attributes': []}, '_ansible_item_result': True, '_ansible_no_log': False, '_ansible_delegated_vars': {'ansible_delegated_host': u'localhost', 'ansible_host': u'localhost'}, u'changed': False, 'failed': False, 'item': u'/var/lib/ceph/bootstrap-mds/ceph.keyring', u'invocation': {u'module_args': {u'checksum_algorithm': u'sha1', u'get_checksum': True, u'follow': False, u'path': u'/var/lib/mistral/f956876e-3d55-4d06-b918-639d66d754d5/ceph-ansible/fetch_dir/69f8f286-e7fc-11e8-8d93-5254009463af//var/lib/ceph/bootstrap-mds/ceph.keyring', u'get_md5': None, u'get_mime': True, u'get_attributes': True}}, 'failed_when_result': False, '_ansible_ignore_errors': None, '_ansible_item_label': u'/var/lib/ceph/bootstrap-mds/ceph.keyring'}])  => {\"changed\": false, \"item\": [\"/var/lib/ceph/bootstrap-mds/ceph.keyring\", {\"_ansible_delegated_vars\": {\"ansible_delegated_host\": \"localhost\", \"ansible_host\": \"localhost\"}, \"_ansible_ignore_errors\": null, \"_ansible_item_label\": \"/var/lib/ceph/bootstrap-mds/ceph.keyring\", \"_ansible_item_result\": true, \"_ansible_no_log\": false, \"_ansible_parsed\": true, \"changed\": false, \"failed\": false, \"failed_when_result\": false, \"invocation\": {\"module_args\": {\"checksum_algorithm\": \"sha1\", \"follow\": false, \"get_attributes\": true, \"get_checksum\": true, \"get_md5\": null, \"get_mime\": true, \"path\": \"/var/lib/mistral/f956876e-3d55-4d06-b918-639d66d754d5/ceph-ansible/fetch_dir/69f8f286-e7fc-11e8-8d93-5254009463af//var/lib/ceph/bootstrap-mds/ceph.keyring\"}}, \"item\": \"/var/lib/ceph/bootstrap-mds/ceph.keyring\", \"stat\": {\"atime\": 1542373239.294403, \"attr_flags\": \"\", \"attributes\": [], \"block_size\": 4096, \"blocks\": 8, \"charset\": \"unknown\", \"checksum\": \"242959f387e1be802c678d37089baab27172e8ef\", \"ctime\": 1542373239.294403, \"dev\": 64769, \"device_type\": 0, \"executable\": false, \"exists\": true, \"gid\": 42430, \"gr_name\": \"mistral\", \"inode\": 90283582, \"isblk\": false, \"ischr\": false, \"isdir\": false, \"isfifo\": false, \"isgid\": false, \"islnk\": false, \"isreg\": true, \"issock\": false, \"isuid\": false, \"mimetype\": \"unknown\", \"mode\": \"0644\", \"mtime\": 1542215850.0, \"nlink\": 1, \"path\": \"/var/lib/mistral/f956876e-3d55-4d06-b918-639d66d754d5/ceph-ansible/fetch_dir/69f8f286-e7fc-11e8-8d93-5254009463af//var/lib/ceph/bootstrap-mds/ceph.keyring\", \"pw_name\": \"mistral\", \"readable\": true, \"rgrp\": true, \"roth\": true, \"rusr\": true, \"size\": 113, \"uid\": 42430, \"version\": null, \"wgrp\": false, \"woth\": false, \"writeable\": true, \"wusr\": true, \"xgrp\": false, \"xoth\": false, \"xusr\": false}}], \"skip_reason\": \"Conditional result was False\"}", 
        "failed: [controller-2] (item=[u'/var/lib/ceph/bootstrap-rbd/ceph.keyring', {'_ansible_parsed': True, u'stat': {u'exists': False}, '_ansible_item_result': True, '_ansible_no_log': False, '_ansible_delegated_vars': {'ansible_delegated_host': u'localhost', 'ansible_host': u'localhost'}, u'changed': False, 'failed': False, 'item': u'/var/lib/ceph/bootstrap-rbd/ceph.keyring', u'invocation': {u'module_args': {u'checksum_algorithm': u'sha1', u'get_checksum': True, u'follow': False, u'path': u'/var/lib/mistral/f956876e-3d55-4d06-b918-639d66d754d5/ceph-ansible/fetch_dir/69f8f286-e7fc-11e8-8d93-5254009463af//var/lib/ceph/bootstrap-rbd/ceph.keyring', u'get_md5': None, u'get_mime': True, u'get_attributes': True}}, 'failed_when_result': False, '_ansible_ignore_errors': None, '_ansible_item_label': u'/var/lib/ceph/bootstrap-rbd/ceph.keyring'}]) => {\"changed\": false, \"item\": [\"/var/lib/ceph/bootstrap-rbd/ceph.keyring\", {\"_ansible_delegated_vars\": {\"ansible_delegated_host\": \"localhost\", \"ansible_host\": \"localhost\"}, \"_ansible_ignore_errors\": null, \"_ansible_item_label\": \"/var/lib/ceph/bootstrap-rbd/ceph.keyring\", \"_ansible_item_result\": true, \"_ansible_no_log\": false, \"_ansible_parsed\": true, \"changed\": false, \"failed\": false, \"failed_when_result\": false, \"invocation\": {\"module_args\": {\"checksum_algorithm\": \"sha1\", \"follow\": false, \"get_attributes\": true, \"get_checksum\": true, \"get_md5\": null, \"get_mime\": true, \"path\": \"/var/lib/mistral/f956876e-3d55-4d06-b918-639d66d754d5/ceph-ansible/fetch_dir/69f8f286-e7fc-11e8-8d93-5254009463af//var/lib/ceph/bootstrap-rbd/ceph.keyring\"}}, \"item\": \"/var/lib/ceph/bootstrap-rbd/ceph.keyring\", \"stat\": {\"exists\": false}}], \"msg\": \"file not found: /var/lib/ceph/bootstrap-rbd/ceph.keyring\"}", 
        "NO MORE HOSTS LEFT *************************************************************", 
        "PLAY RECAP *********************************************************************", 
        "ceph-0                     : ok=2    changed=0    unreachable=0    failed=0   ", 
        "ceph-1                     : ok=2    changed=0    unreachable=0    failed=0   ", 
        "ceph-2                     : ok=2    changed=0    unreachable=0    failed=0   ", 
        "compute-0                  : ok=2    changed=0    unreachable=0    failed=0   ", 
        "compute-1                  : ok=2    changed=0    unreachable=0    failed=0   ", 
        "controller-0               : ok=2    changed=0    unreachable=0    failed=0   ", 
        "controller-1               : ok=2    changed=0    unreachable=0    failed=0   ", 
        "controller-2               : ok=53   changed=6    unreachable=0    failed=1   ", 
        "localhost                  : ok=1    changed=0    unreachable=0    failed=0   ", 
        "Friday 16 November 2018  08:02:09 -0500 (0:00:00.659)       0:01:25.761 ******* ", 
        "=============================================================================== "

File /var/lib/ceph/bootstrap-rbd/ceph.keyring is missing on all controller nodes:

for i in 8 16 15                                                                                                                                           
> do
>     ssh heat-admin.24.${i} 'sudo ls -l /var/lib/ceph/bootstrap-rbd/'
> done
Warning: Permanently added '192.168.24.8' (ECDSA) to the list of known hosts.
total 0
Warning: Permanently added '192.168.24.16' (ECDSA) to the list of known hosts.
total 0
Warning: Permanently added '192.168.24.15' (ECDSA) to the list of known hosts.
total 0


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
openstack-tripleo-heat-templates-9.0.1-0.20181013060879.el7ost.noarch
puppet-ceph-2.5.2-0.20181013125935.f8d3fd2.el7ost.noarch
ceph-ansible-3.1.10-1.el7cp.noarch


Steps to Reproduce:
-------------------
1. Deploy RHOS-12 and upgrade it to RHOS-13
2. Follow upgrade procedure for RHOS-14
3. Step for ceph cluster upgrade fails

Expected results:
-----------------
Ceph cluster is successfully upgraded

Additional info:
----------------
Virtual environment: 3controllers + 2computes + 3ceph

Comment 2 John Fulton 2018-11-16 16:44:11 UTC
ROOT CAUSE: fetch directory was missing bootstrap-rbd key

WORKAROUND:
- extract bootstrap-rbd key from running ceph cluster
- download fetch directory tarball and modify it to contain extracted bootstrap-rbd key
- upload modified fetch directory tarball
- resume upgrade

HOW TO IMPLEMENT WORKAROUND:
- Run the following on your undercloud and then resume the upgrade.
- Set $IP to ip address of the node running Ceph Monitor service (usually a controller node).


IP=192.168.24.8
source ~/stackrc
mkdir working_dir
pushd working_dir
STACK=$(openstack stack list -c "Stack Name" -f value)
CONTAINER=$(echo $(echo $STACK)_ceph_ansible_fetch_dir)
swift download $CONTAINER
TARBALL=$(ls *.tar.gz)
tar xf $TARBALL
ssh heat-admin@$IP "ceph auth get-or-create client.bootstrap-osd " > ceph.keyring
UUID=$(cat ceph_cluster_uuid.conf)
mkdir $UUID/var/lib/ceph/bootstrap-rbd/
mv ceph.keyring $UUID/var/lib/ceph/bootstrap-rbd/
rm $TARBALL
tar cvfz $TARBALL *
swift upload $CONTAINER $TARBALL
popd


FOLLOW UP:
A proposed fix will be posted to osp12 which should prevent this situation but I want to verify with the same test once the patch is available. Note: it didn't come up during upgrade from 13 to 14.

Comment 4 John Fulton 2018-11-19 20:47:02 UTC
Further analysis:

The root cause, a missing bootstrap-rbd key [0][1], remains the same but the proposed fix of somehow gathering it in OSP12 won't help this bug. Instead I'm requesting that ceph-ansible not fail because the bootstrap-rbd key is not present when it is not needed.

The "push ceph files to the ansible server" task [2] didn't fail during the 12>13 upgrade because Jewel was running during that upgrade and {{ceph_config_keys}} is only set to gather the RBD key when the version is Luminous [3].

During the 13>14 upgrade, the version was Lumionus so the same tasks [3] added the RBD key path to the {{ceph_config_keys}} list but when the "push ceph files to the ansible server" task [2] ran, the key wasn't there; the cluster was originally installed with Jewel. Note that the fetch directory did store other keys [4] but not the RBD mirror key.

In case it helps, this cluster wasn't running the RBD mirror service, so it might help to simply not append the RBD mirror key to the {{ceph_config_keys}} if there are no members in the RBD mirror role.

[0] http://ix.io/1s6o
[1] http://paste.openstack.org/show/734923/
[2] https://github.com/ceph/ceph-ansible/blob/v3.1.10/roles/ceph-mon/tasks/docker/fetch_configs.yml
[3] https://github.com/ceph/ceph-ansible/blob/v3.1.10/roles/ceph-mon/tasks/docker/copy_configs.yml#L15
[4] http://paste.openstack.org/show/734926/

Comment 10 John Fulton 2018-11-20 15:10:07 UTC
(In reply to leseb from comment #9)
> This was introduced in
> https://github.com/ceph/ceph-ansible/releases/tag/v3.2.0beta6 via
> https://github.com/ceph/ceph-ansible/commit/
> 40b7747af7b3d139b3017b53f78ab52fd1082a92.

This happened in ceph-ansible-3.1.10-1.el7cp but I don't think the patch [1] was backported to 3.1

[1] https://github.com/ceph/ceph-ansible/commit/40b7747af7b3d139b3017b53f78ab52fd1082a92

Comment 11 seb 2018-11-27 13:36:37 UTC
Fix present in https://github.com/ceph/ceph-ansible/releases/tag/v3.1.11

Comment 17 Giulio Fidente 2018-11-29 12:31:09 UTC
Created attachment 1509799 [details]
ceph_ansible_command.log

Comment 24 seb 2018-12-03 12:02:36 UTC
Yes, today.

Comment 34 Yogev Rabl 2018-12-15 03:22:39 UTC
verified

Comment 36 errata-xmlrpc 2019-01-03 19:02:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0020


Note You need to log in before you can comment on or make changes to this bug.